Seventh Computational and Data Science school for HEP (CoDaS-HEP 2025)
from
Monday 21 July 2025 (08:30)
to
Friday 25 July 2025 (14:00)
Monday 21 July 2025
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Welcome and Overview
-
Peter Elmer
(
Princeton University (US)
)
Welcome and Overview
Peter Elmer
(
Princeton University (US)
)
09:00 - 09:10
Room: 407 Jadwin Hall
09:10
Collaborative Software Development with Git(Hub)
-
David Lange
(
Princeton University (US)
)
Collaborative Software Development with Git(Hub)
David Lange
(
Princeton University (US)
)
09:10 - 10:30
Room: 407 Jadwin Hall
Git is perhaps the single biggest denominator among all software developers, regardless of field or programming language. It serves not only as a version control system but as the backbone of all collaborative software development. This session aims to be 100% hands-on and at least 90% collaborative. We will exclusively work in the browser, using GitHub and GitHub codespaces. Learn forking, branching, opening pull requests, handling merge requests, and more. GitHub account required.
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Getting connected to our compute platform
-
David Lange
(
Princeton University (US)
)
Getting connected to our compute platform
David Lange
(
Princeton University (US)
)
11:00 - 11:30
Room: 407 Jadwin Hall
11:30
What Every Computational Physicist Should Know About Computer Architecture
-
Steven R Lantz
(
Cornell University (US)
)
What Every Computational Physicist Should Know About Computer Architecture
Steven R Lantz
(
Cornell University (US)
)
11:30 - 12:30
Room: 407 Jadwin Hall
These days, everyone in physics in a computational physicist in one way or another. Experiments, theory, and (obviously) simulations all rely heavily on computers. Isn't it time you got to know them better? Computer architecture is an interesting study in its own right, and how well one understands and uses the capabilities of today's systems can have real implications for how fast your computational work gets done. Let's dig in, learn some terminology, and find out what's in there.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
The Scientific Python Ecosystem
-
Manfred Peter Fackeldey
(
Princeton University (US)
)
Henry Fredrick Schreiner
(
Princeton University
)
Andres Rios-Tascon
(
Princeton University
)
The Scientific Python Ecosystem
Manfred Peter Fackeldey
(
Princeton University (US)
)
Henry Fredrick Schreiner
(
Princeton University
)
Andres Rios-Tascon
(
Princeton University
)
13:30 - 15:30
Room: 407 Jadwin Hall
In recent years, Python has become a glue language for scientific computing. Although code written in Python is generally slow, it has a good connection with compiled C code and a common data abstraction through Numpy. Many data processing, statistical, and most machine learning software has a Python interface as a matter of course. This tutorial will introduce you to core Python packages for science, such as NumPy, SciPy, Matplotlib, Pandas, and Numba, (part 1) as well as HEP-specific tools like iminuit, particle, pyjet, and pyhf (part 2). We'll especially focus on accessing ROOT data in uproot and awkward. Part 1 will also cover the Scientific Python Development Guide and a short discussion on packaging.
15:30
Coffee Break
Coffee Break
15:30 - 16:00
Room: 407 Jadwin Hall
16:00
The Scientific Python Ecosystem
-
Manfred Peter Fackeldey
(
Princeton University (US)
)
Henry Fredrick Schreiner
(
Princeton University
)
Andres Rios-Tascon
(
Princeton University
)
The Scientific Python Ecosystem
Manfred Peter Fackeldey
(
Princeton University (US)
)
Henry Fredrick Schreiner
(
Princeton University
)
Andres Rios-Tascon
(
Princeton University
)
16:00 - 17:30
Room: 407 Jadwin Hall
Continued from last time. Part 2 focuses on the HEP portion of the ecosystem.
18:00
Welcome Reception - Lewis Library Atrium
Welcome Reception - Lewis Library Atrium
18:00 - 20:30
Tuesday 22 July 2025
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
An Introduction to Parallel Programming with OpenMP
-
Tim Mattson
(
Intel-Retired
)
An Introduction to Parallel Programming with OpenMP
Tim Mattson
(
Intel-Retired
)
09:00 - 11:00
Room: 407 Jadwin Hall
We introduce parallel systems and the fundamental concepts needed to write parallel software. As much as possible, we cover these concepts by writing OpenMP code. By the time we’re done, you’ll understanding the key ideas behind parallel programming in general, but you’ll also have a deep understanding of the most commonly used elements of OpenMP. We’ll start with how to manipulate threads directly for shared address spaces systems (such as multicore CPUs).
11:00
Jadwin hall plaza
Jadwin hall plaza
11:00 - 11:10
Room: 407 Jadwin Hall
11:10
Coffee Break
Coffee Break
11:10 - 11:30
Room: 407 Jadwin Hall
11:30
OpenMP and parallel programming beyond shared memory CPUs
-
Tim Mattson
(
Intel-Retired
)
OpenMP and parallel programming beyond shared memory CPUs
Tim Mattson
(
Intel-Retired
)
11:30 - 13:00
Room: 407 Jadwin Hall
We continue with OpenMP by exploring: (1) how data is managed in shared memory systems and (2) task level parallelism. We’ll finish our journey into parallel programming with a high-level discussion of how the concepts we’ve learned with OpenMP map onto GPU programming and programming distributed memory systems using MPI. The goal is a solid working knowledge of OpenMP and a high level understanding of parallel programming beyond shared memory, CPU-based systems.
13:00
Lunch
Lunch
13:00 - 14:00
Room: 407 Jadwin Hall
14:00
Introduction to Machine Learning
-
Liv Helen Vage
(
Princeton University (US)
)
Introduction to Machine Learning
Liv Helen Vage
(
Princeton University (US)
)
14:00 - 15:30
Room: 407 Jadwin Hall
In this session we cover the basics of machine learning. We look at gradient descent and a few simple ML models including decision trees. Whether you're a complete beginner or have done a lot of ML, there will be something for everyone.
15:30
Coffee Break
Coffee Break
15:30 - 16:00
Room: 407 Jadwin Hall
16:00
Machine learning - Neural networks & Kaggle
-
Liv Helen Vage
(
Princeton University (US)
)
Machine learning - Neural networks & Kaggle
Liv Helen Vage
(
Princeton University (US)
)
16:00 - 18:00
Room: 407 Jadwin Hall
Almost all advanced ML methods use neural networks. We look at why that is and how they work. We also introduce a Kaggle competition which you will work on during the week. This will let you get experience in a real world ML problem.
18:30
Reception - Prospect House
Reception - Prospect House
18:30 - 21:00
Wednesday 23 July 2025
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Floating Point Arithmetic Is Not Real
-
Tim Mattson
(
Intel
)
Floating Point Arithmetic Is Not Real
Tim Mattson
(
Intel
)
09:00 - 10:00
Room: 407 Jadwin Hall
10:00
The Use and Abuse of Random Numbers
-
Tim Mattson
(
Human Learning Group
)
The Use and Abuse of Random Numbers
Tim Mattson
(
Human Learning Group
)
10:00 - 11:00
Room: 407 Jadwin Hall
11:00
Coffee Break
Coffee Break
11:00 - 11:30
Room: 407 Jadwin Hall
11:30
Vector Parallelism on Multi-Core Processors
-
Steven R Lantz
(
Cornell University (US)
)
Vector Parallelism on Multi-Core Processors
Steven R Lantz
(
Cornell University (US)
)
11:30 - 12:15
Room: 407 Jadwin Hall
All modern CPUs boost their performance through vector processing units (VPUs). VPUs are activated through special SIMD instructions that load multiple numbers into extra-wide registers and operate on them simultaneously. Intel's latest processors feature a plethora of 512-bit vector registers, as well as 1 or 2 VPUs per core, each of which can operate on 16 floats or 8 doubles in every cycle. Typically these SIMD gains are achieved not by the programmer directly, but by (a) the compiler through automatic vectorization of simple loops in the source code, or (b) function calls to highly vectorized performance libraries. Either way, vectorization is a significant component of parallel performance on CPUs, and to maximize performance, it is important to consider how well one's code is vectorized. We will take a look at vector hardware, then turn to simple code examples that illustrate how compiler-generated vectorization works.
12:15
Performance Case Study: the mkFit Particle Tracking Code
-
Steven R Lantz
(
Cornell University (US)
)
Performance Case Study: the mkFit Particle Tracking Code
Steven R Lantz
(
Cornell University (US)
)
12:15 - 13:00
Room: 407 Jadwin Hall
In this presentation, we consider how a physics application may be restructured to take better advantage of vectorization and multithreading. For vectorization, we focus on the Matriplex concept that is used to implement parallel Kalman filtering in our collaboration's particle tracking R&D project called mkFit. Drastic changes to data structures and loops were required to help the compiler find the SIMD opportunities in the algorithm. For multithreading, we examine how binning detector hits and tracks in an abstraction of the detector geometry enabled track candidates to be processed in bunches. We conclude by looking at how Intel VTune and Advisor, together with simple test codes, played a role in identifying and resolving trouble spots that affected performance. The mkFit code is now part of the production software for CMS in LHC Run 3, and a version of mkFit is under development for the Phase 2 CMS detector.
13:00
Lunch
Lunch
13:00 - 14:00
Room: 407 Jadwin Hall
14:00
Columnar Data Analysis
-
Massimiliano Galli
(
Princeton University (US)
)
Ianna Osborne
(
Princeton University
)
Columnar Data Analysis
Massimiliano Galli
(
Princeton University (US)
)
Ianna Osborne
(
Princeton University
)
14:00 - 15:30
Room: 407 Jadwin Hall
Data analysis languages, such as Numpy, MATLAB, R, IDL, and APL, are typically interactive with an array-at-a-time interface. Instead of performing an entire analysis in a single loop, each step in the calculation is a separate pass, letting the user inspect distributions each step of the way. Unfortunately, these languages are limited to primitive data types: mostly numbers and booleans. Variable-length and nested data structures, such as different numbers of particles per event, don't fit this model. Fortunately, the model can be extended. This tutorial will introduce awkward-array, the concepts of columnar data structures, and how to use them in data analysis, such as computing combinatorics (quantities depending on combinations of particles) without any for loops.
15:30
Coffee Break
Coffee Break
15:30 - 16:00
Room: 407 Jadwin Hall
16:00
Columnar Data Analysis
-
Massimiliano Galli
(
Princeton University (US)
)
Ianna Osborne
(
Princeton University
)
Columnar Data Analysis
Massimiliano Galli
(
Princeton University (US)
)
Ianna Osborne
(
Princeton University
)
16:00 - 18:00
Room: 407 Jadwin Hall
19:00
Dinner on your own
Dinner on your own
19:00 - 21:00
Thursday 24 July 2025
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Machine Learning - Advanced neural net models
-
Liv Helen Vage
(
Princeton University (US)
)
Machine Learning - Advanced neural net models
Liv Helen Vage
(
Princeton University (US)
)
09:00 - 10:30
Room: 407 Jadwin Hall
There are lots of flavours of nerual nets depending on the problem we are trying to solve. This lecture looks at convolutional neural nets, graph neural nets and transformers. We also have a brief glance at modern LLMs.
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Machine Learning
-
Liv Helen Vage
(
Princeton University (US)
)
Machine Learning
Liv Helen Vage
(
Princeton University (US)
)
11:00 - 12:00
Room: 407 Jadwin Hall
12:00
GPU Programming
-
Brij Kishor Jashal
(
Rutherford Appelton Laboratory
)
GPU Programming
Brij Kishor Jashal
(
Rutherford Appelton Laboratory
)
12:00 - 13:00
Room: 407 Jadwin Hall
13:00
Lunch
Lunch
13:00 - 14:00
Room: 407 Jadwin Hall
14:00
GPU Programming
-
Brij Kishor Jashal
(
Rutherford Appelton Laboratory
)
GPU Programming
Brij Kishor Jashal
(
Rutherford Appelton Laboratory
)
14:00 - 16:00
Room: 407 Jadwin Hall
16:00
Coffee Break
Coffee Break
16:00 - 16:30
Room: 407 Jadwin Hall
16:30
GPU Programming
-
Brij Kishor Jashal
(
Rutherford Appelton Laboratory
)
GPU Programming
Brij Kishor Jashal
(
Rutherford Appelton Laboratory
)
16:30 - 18:00
Room: 407 Jadwin Hall
18:30
BBQ and Drinks - Palmer House
BBQ and Drinks - Palmer House
18:30 - 21:30
Friday 25 July 2025
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
You are qualified to be teachers!
-
Sudhir Malik
(
University of Puerto Rico (US)
)
You are qualified to be teachers!
Sudhir Malik
(
University of Puerto Rico (US)
)
09:00 - 09:10
Room: 407 Jadwin Hall
Join IRIS-HEP/HSF Training!
09:10
Machine Learning Wrapup
-
Liv Helen Vage
(
Princeton University (US)
)
Machine Learning Wrapup
Liv Helen Vage
(
Princeton University (US)
)
09:10 - 10:10
Room: 407 Jadwin Hall
There will be a brief presentation of some of the current and ongoing research on ML in HEP. After this, we look at the results of the Kaggle competition.
10:10
Line Segment Tracking at the HL-LHC
-
Gavin Niendorf
(
Cornell University (US)
)
Line Segment Tracking at the HL-LHC
Gavin Niendorf
(
Cornell University (US)
)
10:10 - 11:00
Room: 407 Jadwin Hall
11:00
Coffee Break
Coffee Break
11:00 - 11:30
Room: 407 Jadwin Hall
11:30
Closing Session
Closing Session
11:30 - 12:30
Room: 407 Jadwin Hall