Fifth Computational and Data Science school for HEP (CoDaS-HEP 2023)
from
Monday 17 July 2023 (08:30)
to
Friday 21 July 2023 (13:00)
Monday 17 July 2023
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Welcome and Overview
-
Peter Elmer
(
Princeton University (US)
)
Welcome and Overview
Peter Elmer
(
Princeton University (US)
)
09:00 - 09:10
Room: 407 Jadwin Hall
09:10
Collaborative Software Development with Git(Hub)
-
Kilian Lieret
(
Princeton University
)
Collaborative Software Development with Git(Hub)
Kilian Lieret
(
Princeton University
)
09:10 - 10:30
Room: 407 Jadwin Hall
Git is perhaps the single biggest denominator among all software developers, regardless of field or programming language. It serves not only as a version control system but as the backbone of all collaborative software development. This session aims to be 100% hands-on and at least 90% collaborative. We will exclusively work in the browser, using GitHub and GitHub codespaces. Learn forking, branching, opening pull requests, handling merge requests, and more. GitHub account required.
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
What Every Computational Physicist Should Know About Computer Architecture
-
Steven R Lantz
(
Cornell University (US)
)
What Every Computational Physicist Should Know About Computer Architecture
Steven R Lantz
(
Cornell University (US)
)
11:00 - 11:45
Room: 407 Jadwin Hall
These days, everyone in physics in a computational physicist in one way or another. Experiments, theory, and (obviously) simulations all rely heavily on computers. Isn't it time you got to know them better? Computer architecture is an interesting study in its own right, and how well one understands and uses the capabilities of today's systems can have real implications for how fast your computational work gets done. Let's dig in, learn some terminology, and find out what's in there.
11:45
Vector Parallelism on Multi-Core Processors
-
Steven R Lantz
(
Cornell University (US)
)
Vector Parallelism on Multi-Core Processors
Steven R Lantz
(
Cornell University (US)
)
11:45 - 12:30
Room: 407 Jadwin Hall
All modern CPUs boost their performance through vector processing units (VPUs). VPUs are activated through special SIMD instructions that load multiple numbers into extra-wide registers and operate on them simultaneously. Intel's latest processors feature a plethora of 512-bit vector registers, as well as 1 or 2 VPUs per core, each of which can operate on 16 floats or 8 doubles in every cycle. Typically these SIMD gains are achieved not by the programmer directly, but by (a) the compiler through automatic vectorization of simple loops in the source code, or (b) function calls to highly vectorized performance libraries. Either way, vectorization is a significant component of parallel performance on CPUs, and to maximize performance, it is important to consider how well one's code is vectorized. We will take a look at vector hardware, then turn to simple code examples that illustrate how compiler-generated vectorization works.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Parallel Programming - An introduction to parallel computing with OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - An introduction to parallel computing with OpenMP
Tim Mattson
(
Intel
)
13:30 - 15:00
Room: 407 Jadwin Hall
We start with a discussion of the historical roots of parallel computing and how they appear in a modern context. We'll then use OpenMP and a series of hands-on exercises to explore the fundamental concepts behind parallel programming.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Parallel Programming - The OpenMP Common Core
-
Tim Mattson
(
Intel
)
Parallel Programming - The OpenMP Common Core
Tim Mattson
(
Intel
)
15:30 - 17:30
Room: 407 Jadwin Hall
We will explore through hands-on exercises the common core of OpenMP; that is, the features of the API that most OpenMP programmers use in all their parallel programs. This will provide a foundation of understanding you can build on as you explore the more advanced features of OpenMP.
18:00
Welcome Light Reception
Welcome Light Reception
18:00 - 19:30
Tuesday 18 July 2023
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
Parallel Programming - Working with OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - Working with OpenMP
Tim Mattson
(
Intel
)
08:30 - 10:30
Room: 407 Jadwin Hall
We now know how to work with threads directly and how to parallelize loops with OpenMP directives. Now we move on managing the data environment. Our Hands-on exercises will be much more complicated as we explore how to debug multithreaded programs. Then we move on to task-level parallelism in OpenMP and wrap up with a look at the core design patterns of OpenMP.
10:30
Group Photo - Jadwin Hall plaza
Group Photo - Jadwin Hall plaza
10:30 - 10:40
10:40
Coffee Break
Coffee Break
10:40 - 11:00
Room: 407 Jadwin Hall
11:00
Parallel Programming - The world beyond OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - The world beyond OpenMP
Tim Mattson
(
Intel
)
11:00 - 12:30
Room: 407 Jadwin Hall
Parallel programming is hard. There is no way to avoid that reality. We can mitigate these difficulties by focusing on the fundamental design patterns from which most parallel algorithms are constructed. Once mastered, these patterns make it much easier to understand how your problems map onto other parallel programming models. Hence for our last session on parallel programming, we'll review these essential design patterns as seen in OpenMP, and then show how they appear in cluster computing (with MPI) and GPGPU computing (with OpenMP and then a quick survey of other GPGPU languages).
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
The Scientific Python Ecosystem
-
Henry Fredrick Schreiner
(
Princeton University
)
The Scientific Python Ecosystem
Henry Fredrick Schreiner
(
Princeton University
)
13:30 - 15:00
Room: 407 Jadwin Hall
In recent years, Python has become a glue language for scientific computing. Although code written in Python is generally slow, it has a good connection with compiled C code and a common data abstraction through Numpy. Many data processing, statistical, and most machine learning software has a Python interface as a matter of course. This tutorial will introduce you to core Python packages for science, such as NumPy, SciPy, Matplotlib, Pandas, and Numba, (part 1) as well as HEP-specific tools like iminuit, particle, pyjet, and pyhf (part 2). We'll especially focus on accessing ROOT data in uproot and awkward. Part 1 will also cover the Scientific Python Development Guide and a short discussion on packaging.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
The Scientific Python Ecosystem
-
Henry Fredrick Schreiner
(
Princeton University
)
The Scientific Python Ecosystem
Henry Fredrick Schreiner
(
Princeton University
)
15:30 - 17:30
Room: 407 Jadwin Hall
Continued from last time. Part 2 focuses on the HEP portion of the ecosystem.
18:30
BBQ and Drinks - Palmer House
BBQ and Drinks - Palmer House
18:30 - 21:00
Wednesday 19 July 2023
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
Floating Point Arithmetic Is Not Real
-
Ianna Osborne
(
Princeton University
)
Floating Point Arithmetic Is Not Real
Ianna Osborne
(
Princeton University
)
08:30 - 09:30
Room: 407 Jadwin Hall
09:30
The Use and Abuse of Random Numbers
-
David Lange
(
Princeton University (US)
)
The Use and Abuse of Random Numbers
David Lange
(
Princeton University (US)
)
09:30 - 10:30
Room: 407 Jadwin Hall
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Vector Parallelism on Multi-Core Processors (continued)
-
Steven R Lantz
(
Cornell University (US)
)
Vector Parallelism on Multi-Core Processors (continued)
Steven R Lantz
(
Cornell University (US)
)
11:00 - 11:30
Room: 407 Jadwin Hall
11:30
Introduction to Performance Tuning & Optimization Tools
-
Steven R Lantz
(
Cornell University (US)
)
Introduction to Performance Tuning & Optimization Tools
Steven R Lantz
(
Cornell University (US)
)
11:30 - 12:00
Room: 407 Jadwin Hall
Improving the performance of scientific code is something that is often considered to be an art that is difficult, mysterious, and time-consuming, but it doesn't have to be. Performance tuning and optimization tools can greatly aid in the evaluation and understanding of the performance of scientific code. In this talk we will discuss how to approach performance tuning and introduce some measurement tools to evaluate the performance of compiled-language (C/C++/Fortran) code. Powerful profiling tools, such as Intel VTune and Advisor, will be introduced and discussed.
12:00
Performance Case Study: the mkFit Particle Tracking Code
-
Steven R Lantz
(
Cornell University (US)
)
Performance Case Study: the mkFit Particle Tracking Code
Steven R Lantz
(
Cornell University (US)
)
12:00 - 12:30
Room: 407 Jadwin Hall
This is a demo of how to run various analyses with Intel Advisor, to see what they reveal about hotspots in current version of the mkFit particle tracking code; these may represent opportunities for improving the code's performance.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Machine Learning: Introduction to Machine Learning
-
Adrian Alan Pol
(
Princeton University (US)
)
Abhijith Gandrakota
(
Fermi National Accelerator Lab. (US)
)
Machine Learning: Introduction to Machine Learning
Adrian Alan Pol
(
Princeton University (US)
)
Abhijith Gandrakota
(
Fermi National Accelerator Lab. (US)
)
13:30 - 15:00
Room: 407 Jadwin Hall
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Machine Learning: Supervised Deep Learning
-
Adrian Alan Pol
(
Princeton University (US)
)
Abhijith Gandrakota
(
Fermi National Accelerator Lab. (US)
)
Machine Learning: Supervised Deep Learning
Adrian Alan Pol
(
Princeton University (US)
)
Abhijith Gandrakota
(
Fermi National Accelerator Lab. (US)
)
15:30 - 17:30
Room: 407 Jadwin Hall
18:00
Dinner on your own
Dinner on your own
18:00 - 20:00
Thursday 20 July 2023
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
Machine Learning: Convolutional Neural Networks and Autoencoders
-
Adrian Alan Pol
(
Princeton University (US)
)
Abhijith Gandrakota
(
Fermi National Accelerator Lab. (US)
)
Machine Learning: Convolutional Neural Networks and Autoencoders
Adrian Alan Pol
(
Princeton University (US)
)
Abhijith Gandrakota
(
Fermi National Accelerator Lab. (US)
)
08:30 - 10:00
Room: 407 Jadwin Hall
10:00
Coffee Break
Coffee Break
10:00 - 10:30
Room: 407 Jadwin Hall
10:30
Machine Learning: Permutation Invarience
-
Abhijith Gandrakota
(
Fermi National Accelerator Lab. (US)
)
Adrian Alan Pol
(
Princeton University (US)
)
Machine Learning: Permutation Invarience
Abhijith Gandrakota
(
Fermi National Accelerator Lab. (US)
)
Adrian Alan Pol
(
Princeton University (US)
)
10:30 - 12:30
Room: 407 Jadwin Hall
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Columnar Data Analysis
-
Jim Pivarski
(
Princeton University
)
Ioana Ifrim
(
Princeton University (US)
)
Columnar Data Analysis
Jim Pivarski
(
Princeton University
)
Ioana Ifrim
(
Princeton University (US)
)
13:30 - 15:00
Room: 407 Jadwin Hall
Data analysis languages, such as Numpy, MATLAB, R, IDL, and ADL, are typically interactive with an array-at-a-time interface. Instead of performing an entire analysis in a single loop, each step in the calculation is a separate pass, letting the user inspect distributions each step of the way. Unfortunately, these languages are limited to primitive data types: mostly numbers and booleans. Variable-length and nested data structures, such as different numbers of particles per event, don't fit this model. Fortunately, the model can be extended. This tutorial will introduce awkward-array, the concepts of columnar data structures, and how to use them in data analysis, such as computing combinatorics (quantities depending on combinations of particles) without any for loops.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Columnar Data Analysis
-
Ioana Ifrim
(
Princeton University (US)
)
Jim Pivarski
(
Princeton University
)
Columnar Data Analysis
Ioana Ifrim
(
Princeton University (US)
)
Jim Pivarski
(
Princeton University
)
15:30 - 17:30
Room: 407 Jadwin Hall
18:00
School Dinner - Frick Atrium and Patio
School Dinner - Frick Atrium and Patio
18:00 - 21:00
Friday 21 July 2023
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Things you didn't know you needed
-
Henry Fredrick Schreiner
(
Princeton University
)
Kilian Lieret
(
Princeton University
)
Things you didn't know you needed
Henry Fredrick Schreiner
(
Princeton University
)
Kilian Lieret
(
Princeton University
)
09:00 - 09:45
Room: 407 Jadwin Hall
09:45
Parallelized Track Reconstruction for the LHC: the mkFit Project
-
Steven R Lantz
(
Cornell University (US)
)
Parallelized Track Reconstruction for the LHC: the mkFit Project
Steven R Lantz
(
Cornell University (US)
)
09:45 - 10:30
Room: 407 Jadwin Hall
In this presentation, we consider how a physics application may be restructured to take better advantage of vectorization and multithreading. For vectorization, we focus on the Matriplex concept that is used to implement parallel Kalman filtering in our collaboration's particle tracking R&D project called mkFit. Drastic changes to data structures and loops were required to help the compiler find the SIMD opportunities in the algorithm. For multithreading, we examine how binning detector hits and tracks in an abstraction of the detector geometry enabled track candidates to be processed in bunches. We conclude by looking at how Intel VTune and Advisor, together with simple test codes, played a role in identifying and resolving trouble spots that affected performance. The mkFit code is now part of the production software for CMS in LHC Run 3.
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Closing Session
Closing Session
11:00 - 12:30
Room: 407 Jadwin Hall