Fourth Computational and Data Science school for HEP (CoDaS-HEP 2022)
from
Monday 1 August 2022 (08:30)
to
Friday 5 August 2022 (13:00)
Monday 1 August 2022
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Welcome and Overview
-
Peter Elmer
(
Princeton University (US)
)
Welcome and Overview
Peter Elmer
(
Princeton University (US)
)
09:00 - 09:10
Room: 407 Jadwin Hall
09:10
Setup and Collaborative Programming
-
Kilian Lieret
Setup and Collaborative Programming
Kilian Lieret
09:10 - 10:30
Room: 407 Jadwin Hall
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
What Every Computational Physicist Should Know About Computer Architecture
-
Steven R Lantz
(
Cornell University (US)
)
What Every Computational Physicist Should Know About Computer Architecture
Steven R Lantz
(
Cornell University (US)
)
11:00 - 12:00
Room: 407 Jadwin Hall
These days, everyone in physics in a computational physicist in one way or another. Experiments, theory, and (obviously) simulations all rely heavily on computers. Isn't it time you got to know them better? Computer architecture is an interesting study in its own right, and how well one understands and uses the capabilities of today's systems can have real implications for how fast your computational work gets done. Let's dig in, learn some terminology, and find out what's in there.
12:00
Vector Parallelism on Multi-Core Processors
-
Steven R Lantz
(
Cornell University (US)
)
Vector Parallelism on Multi-Core Processors
Steven R Lantz
(
Cornell University (US)
)
12:00 - 12:30
Room: 407 Jadwin Hall
All modern CPUs boost their performance through vector processing units (VPUs). VPUs are activated through special SIMD instructions that load multiple numbers into extra-wide registers and operate on them simultaneously. Intel's latest processors feature a plethora of 512-bit vector registers, as well as 1 or 2 VPUs per core, each of which can operate on 16 floats or 8 doubles in every cycle. Typically these SIMD gains are achieved not by the programmer directly, but by (a) the compiler through automatic vectorization of simple loops in the source code, or (b) function calls to highly vectorized performance libraries. Either way, vectorization is a significant component of parallel performance on CPUs, and to maximize performance, it is important to consider how well one's code is vectorized. We will take a look at vector hardware, then turn to simple code examples that illustrate how compiler-generated vectorization works.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
The Scientific Python Ecosystem
-
Henry Fredrick Schreiner
(
Princeton University
)
The Scientific Python Ecosystem
Henry Fredrick Schreiner
(
Princeton University
)
13:30 - 15:00
Room: 407 Jadwin Hall
In recent years, Python has become a glue language for scientific computing. Although code written in Python is generally slow, it has a good connection with compiled C code and a common data abstraction through Numpy. Many data processing, statistical, and most machine learning software has a Python interface as a matter of course. This tutorial will introduce you to core Python packages for science, such as Numpy, SciPy, Matplotlib, Pandas, and Numba, as well as HEP-specific tools like iminuit, particle, pyjet, and pyhf. We'll especially focus on accessing ROOT data in uproot and awkward.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
The Scientific Python Ecosystem
-
Henry Fredrick Schreiner
(
Princeton University
)
The Scientific Python Ecosystem
Henry Fredrick Schreiner
(
Princeton University
)
15:30 - 17:30
Room: 407 Jadwin Hall
18:00
Welcome Reception
Welcome Reception
18:00 - 20:30
Tuesday 2 August 2022
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
The Use and Abuse of Random Numbers
-
David Lange
(
Princeton University (US)
)
The Use and Abuse of Random Numbers
David Lange
(
Princeton University (US)
)
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Floating Point Arithmetic Is Not Real
-
Bei Wang
(
Princeton University
)
Floating Point Arithmetic Is Not Real
Bei Wang
(
Princeton University
)
09:00 - 10:30
Room: 407 Jadwin Hall
10:30
Group Photo - Jadwin Hall plaza
Group Photo - Jadwin Hall plaza
10:30 - 10:40
10:40
Coffee Break
Coffee Break
10:40 - 11:00
Room: 407 Jadwin Hall
11:00
Vector Parallelism on Multi-Core Processors (continued)
-
Steven R Lantz
(
Cornell University (US)
)
Vector Parallelism on Multi-Core Processors (continued)
Steven R Lantz
(
Cornell University (US)
)
11:00 - 11:30
Room: 407 Jadwin Hall
11:30
Introduction to Performance Tuning & Optimization Tools
-
Steven R Lantz
(
Cornell University (US)
)
Introduction to Performance Tuning & Optimization Tools
Steven R Lantz
(
Cornell University (US)
)
11:30 - 12:00
Room: 407 Jadwin Hall
Improving the performance of scientific code is something that is often considered to be an art that is difficult, mysterious, and time-consuming, but it doesn't have to be. Performance tuning and optimization tools can greatly aid in the evaluation and understanding of the performance of scientific code. In this talk we will discuss how to approach performance tuning and introduce some measurement tools to evaluate the performance of compiled-language (C/C++/Fortran) code. Powerful profiling tools, such as Intel VTune and Advisor, will be introduced and discussed.
12:00
Performance Case Study: the mkFit Particle Tracking Code
-
Steven R Lantz
(
Cornell University (US)
)
Performance Case Study: the mkFit Particle Tracking Code
Steven R Lantz
(
Cornell University (US)
)
12:00 - 12:30
Room: 407 Jadwin Hall
In this case study, we consider how a physics application may be restructured to take better advantage of vectorization. In particular, we focus on the Matriplex concept that is used to implement parallel Kalman filtering in our collaboration's particle tracking R&D project called mkFit. The mkFit code is now part of the production software for CMS in LHC Run 3. Drastic changes to data structures and loops were required to help the compiler find the SIMD opportunities in the algorithm. We conclude by looking at how Intel VTune and Advisor, together with simple test codes, played a role in identifying and resolving trouble spots that affected performance.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Machine Learning: Introduction to Machine Learning, Decision Trees
-
Savannah Jennifer Thais
(
Princeton University (US)
)
Adrian Alan Pol
(
Princeton University (US)
)
Machine Learning: Introduction to Machine Learning, Decision Trees
Savannah Jennifer Thais
(
Princeton University (US)
)
Adrian Alan Pol
(
Princeton University (US)
)
13:30 - 15:00
Room: 407 Jadwin Hall
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Machine Learning: Introduction to Deep Learning, Convolutional Neural Networks
-
Savannah Jennifer Thais
(
Princeton University (US)
)
Adrian Alan Pol
(
Princeton University (US)
)
Machine Learning: Introduction to Deep Learning, Convolutional Neural Networks
Savannah Jennifer Thais
(
Princeton University (US)
)
Adrian Alan Pol
(
Princeton University (US)
)
15:30 - 17:30
Room: 407 Jadwin Hall
With the vast amount of data and increasing computing power, the last decade saw an explosion of deep learning applications for real-world problems, especially when working with images. Deep learning is increasingly adopted in the high-energy physics field. We will go through the basics of neural networks: how do we train them and how do they make predictions. We will cover the basic building blocks of modern solutions. In the hands-on tutorial, you will learn how to use deep learning for jet tagging with high or low-level input data.
18:30
Social Mixer - Prospect House
Social Mixer - Prospect House
18:30 - 20:30
Wednesday 3 August 2022
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
Machine Learning: Introduction to Graph Neural Networks
-
Adrian Alan Pol
(
Princeton University (US)
)
Savannah Jennifer Thais
(
Princeton University (US)
)
Machine Learning: Introduction to Graph Neural Networks
Adrian Alan Pol
(
Princeton University (US)
)
Savannah Jennifer Thais
(
Princeton University (US)
)
08:30 - 10:00
Room: 407 Jadwin Hall
10:00
Coffee Break
Coffee Break
10:00 - 10:30
Room: 407 Jadwin Hall
10:30
Machine Learning: Unsupervised Machine Learning, Autoencoders
-
Savannah Jennifer Thais
(
Princeton University (US)
)
Adrian Alan Pol
(
Princeton University (US)
)
Machine Learning: Unsupervised Machine Learning, Autoencoders
Savannah Jennifer Thais
(
Princeton University (US)
)
Adrian Alan Pol
(
Princeton University (US)
)
10:30 - 12:30
Room: 407 Jadwin Hall
Not all machine learning problems are created equal. Some tasks require working with no labels either to discover similarities between data points or to spot anomalies. Clustering is an important task of grouping similar data together. Dimensionality reduction helps with understanding the input space with a lot of input features. An autoencoder is a type of neural network that aims to learn the encoding of unlabeled data. They could be used for noise removal and dimensionality reduction. They can be useful to generate new data from arbitrary encoding. However, we need to learn the latent code distribution for that. This is where variational autoencoders come in handy. In this tutorial, you will write your own clustering algorithm, use a dimensionality reduction algorithm to visualize and understand the data and train a variational autoencoder to generate new data.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Columnar Data Analysis
-
Ioana Ifrim
(
Princeton University (US)
)
Jim Pivarski
(
Princeton University
)
Columnar Data Analysis
Ioana Ifrim
(
Princeton University (US)
)
Jim Pivarski
(
Princeton University
)
13:30 - 15:00
Room: 407 Jadwin Hall
Data analysis languages, such as Numpy, MATLAB, R, IDL, and ADL, are typically interactive with an array-at-a-time interface. Instead of performing an entire analysis in a single loop, each step in the calculation is a separate pass, letting the user inspect distributions each step of the way. Unfortunately, these languages are limited to primitive data types: mostly numbers and booleans. Variable-length and nested data structures, such as different numbers of particles per event, don't fit this model. Fortunately, the model can be extended. This tutorial will introduce awkward-array, the concepts of columnar data structures, and how to use them in data analysis, such as computing combinatorics (quantities depending on combinations of particles) without any for loops.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Columnar Data Analysis
-
Ioana Ifrim
(
Princeton University (US)
)
Jim Pivarski
(
Princeton University
)
Columnar Data Analysis
Ioana Ifrim
(
Princeton University (US)
)
Jim Pivarski
(
Princeton University
)
15:30 - 17:30
Room: 407 Jadwin Hall
18:00
Dinner on your own
Dinner on your own
18:00 - 20:00
Thursday 4 August 2022
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
Parallel Programming - An introduction to parallel computing with OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - An introduction to parallel computing with OpenMP
Tim Mattson
(
Intel
)
08:30 - 10:30
Room: 407 Jadwin Hall
We start with a discussion of the historical roots of parallel computing and how they appear in a modern context. We'll then use OpenMP and a series of hands-on exercises to explore the fundamental concepts behind parallel programming.
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Parallel Programming - The OpenMP Common Core
-
Tim Mattson
(
Intel
)
Parallel Programming - The OpenMP Common Core
Tim Mattson
(
Intel
)
11:00 - 12:30
Room: 407 Jadwin Hall
We will explore through hands-on exercises the common core of OpenMP; that is, the features of the API that most OpenMP programmers use in all their parallel programs. This will provide a foundation of understanding you can build on as you explore the more advanced features of OpenMP.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Parallel Programming - Working with OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - Working with OpenMP
Tim Mattson
(
Intel
)
13:30 - 15:00
Room: 407 Jadwin Hall
We'll explore more complex OpenMP problems and get a feel for how to work with OpenMP with real applications.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Parallel Programming - The world beyond OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - The world beyond OpenMP
Tim Mattson
(
Intel
)
15:30 - 17:00
Room: 407 Jadwin Hall
Parallel programming is hard. There is no way to avoid that reality. We can mitigate these difficulties by focusing on the fundamental design patterns from which most parallel algorithms are constructed. Once mastered, these patterns make it much easier to understand how your problems map onto other parallel programming models. Hence for our last session on parallel programming, we'll review these essential design patterns as seen in OpenMP, and then show how they appear in cluster computing (with MPI) and GPGPU computing (with OpenCL and a bit of CUDA).
18:00
School Dinner - Nassau Club
School Dinner - Nassau Club
18:00 - 20:00
Friday 5 August 2022
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Things you didn't know you needed
-
Kilian Lieret
Henry Fredrick Schreiner
(
Princeton University
)
Things you didn't know you needed
Kilian Lieret
Henry Fredrick Schreiner
(
Princeton University
)
09:00 - 09:45
Room: 407 Jadwin Hall
09:45
Example Application: Line Segment Tracking
-
Tres Reid
(
Cornell University (US)
)
Example Application: Line Segment Tracking
Tres Reid
(
Cornell University (US)
)
09:45 - 10:30
Room: 407 Jadwin Hall
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Closing Session
Closing Session
11:00 - 12:30
Room: 407 Jadwin Hall