Fourth Computational and Data Science school for HEP (CoDaS-HEP 2022)

US/Eastern
407 Jadwin Hall (Princeton University)

407 Jadwin Hall

Princeton University

Princeton Center For Theoretical Science (PCTS)
Description

The fourth school on tools, techniques and methods for Computational and Data Science for High Energy Physics (CoDaS-HEP 2022) will take place on 1-5 August, 2022, at Princeton University.

Advanced software is a critical ingredient to scientific research. Training young researchers in the latest tools and techniques is an essential part of developing the skills required for a successful career both in research and in industry.

The CoDaS-HEP school aims to provide a broad introduction to these critical skills as well as an overview of applications High Energy Physics. Specific topics to be covered at the school include:

  • Parallel Programming 
  • Big Data Tools and Techniques
  • Machine Learning 
  • Practical skills like performance evaluation, collaborative use of git/github, etc.

The school offers a limited number of young researchers an opportunity to learn these skills from experienced scientists and instructors. Successful applicants will receive travel and lodging support to attend the school.

School website: http://codas-hep.org   

Applications for the CoDaS-HEP 2022 school are now being accepted. Please use this Google Form to apply. The deadline for application is 6 May, 2022. Applicants will be notified regarding acceptance and available travel support by 15 May.

The school lectures will take place in 407 Jadwin Hall, in the main lecture hall of the Princeton Center for Theoretical Science (PCTS).

This project is supported by National Science Foundation grants OAC-1829707, OAC-1829729 and OAC-1836650, the Princeton Institute for Computational Science and Engineering (PICSciE), the Princeton Physics Department, the Office of the Dean for Research of Princeton University and the Enrico Fermi Institute at the University of Chicago. Any opinions, findings, conclusions or recommendations expressed in this material are those of the developers and do not necessarily reflect the views of the National Science Foundation.

 

    • 08:30 09:00
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 09:00 09:10
      Welcome and Overview 10m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Peter Elmer (Princeton University (US))
    • 09:10 10:30
      Setup and Collaborative Programming 1h 20m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Kilian Lieret
    • 10:30 11:00
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 11:00 12:00
      What Every Computational Physicist Should Know About Computer Architecture 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      These days, everyone in physics in a computational physicist in one way or another. Experiments, theory, and (obviously) simulations all rely heavily on computers. Isn't it time you got to know them better? Computer architecture is an interesting study in its own right, and how well one understands and uses the capabilities of today's systems can have real implications for how fast your computational work gets done. Let's dig in, learn some terminology, and find out what's in there.

      Speaker: Steven R Lantz (Cornell University (US))
    • 12:00 12:30
      Vector Parallelism on Multi-Core Processors 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      All modern CPUs boost their performance through vector processing units (VPUs). VPUs are activated through special SIMD instructions that load multiple numbers into extra-wide registers and operate on them simultaneously. Intel's latest processors feature a plethora of 512-bit vector registers, as well as 1 or 2 VPUs per core, each of which can operate on 16 floats or 8 doubles in every cycle. Typically these SIMD gains are achieved not by the programmer directly, but by (a) the compiler through automatic vectorization of simple loops in the source code, or (b) function calls to highly vectorized performance libraries. Either way, vectorization is a significant component of parallel performance on CPUs, and to maximize performance, it is important to consider how well one's code is vectorized. We will take a look at vector hardware, then turn to simple code examples that illustrate how compiler-generated vectorization works.

      Speaker: Steven R Lantz (Cornell University (US))
    • 12:30 13:30
      Lunch 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 13:30 15:00
      The Scientific Python Ecosystem 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      In recent years, Python has become a glue language for scientific computing. Although code written in Python is generally slow, it has a good connection with compiled C code and a common data abstraction through Numpy. Many data processing, statistical, and most machine learning software has a Python interface as a matter of course.

      This tutorial will introduce you to core Python packages for science, such as Numpy, SciPy, Matplotlib, Pandas, and Numba, as well as HEP-specific tools like iminuit, particle, pyjet, and pyhf. We'll especially focus on accessing ROOT data in uproot and awkward.

      Speaker: Henry Fredrick Schreiner (Princeton University)
    • 15:00 15:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 15:30 17:30
      The Scientific Python Ecosystem 2h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Henry Fredrick Schreiner (Princeton University)
    • 18:00 20:30
      Welcome Reception 2h 30m Palmer House

      Palmer House

    • 08:00 08:30
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 08:30 09:00
      The Use and Abuse of Random Numbers 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: David Lange (Princeton University (US))
    • 09:00 10:30
      Floating Point Arithmetic Is Not Real 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Bei Wang (Princeton University)
    • 10:30 10:40
      Group Photo - Jadwin Hall plaza 10m
    • 10:40 11:00
      Coffee Break 20m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 11:00 11:30
      Vector Parallelism on Multi-Core Processors (continued) 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Steven R Lantz (Cornell University (US))
    • 11:30 12:00
      Introduction to Performance Tuning & Optimization Tools 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Improving the performance of scientific code is something that is often considered to be an art that is difficult, mysterious, and time-consuming, but it doesn't have to be. Performance tuning and optimization tools can greatly aid in the evaluation and understanding of the performance of scientific code. In this talk we will discuss how to approach performance tuning and introduce some measurement tools to evaluate the performance of compiled-language (C/C++/Fortran) code. Powerful profiling tools, such as Intel VTune and Advisor, will be introduced and discussed.

      Speaker: Steven R Lantz (Cornell University (US))
    • 12:00 12:30
      Performance Case Study: the mkFit Particle Tracking Code 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      In this case study, we consider how a physics application may be restructured to take better advantage of vectorization. In particular, we focus on the Matriplex concept that is used to implement parallel Kalman filtering in our collaboration's particle tracking R&D project called mkFit. The mkFit code is now part of the production software for CMS in LHC Run 3. Drastic changes to data structures and loops were required to help the compiler find the SIMD opportunities in the algorithm. We conclude by looking at how Intel VTune and Advisor, together with simple test codes, played a role in identifying and resolving trouble spots that affected performance.

      Speaker: Steven R Lantz (Cornell University (US))
    • 12:30 13:30
      Lunch 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 13:30 15:00
      Machine Learning: Introduction to Machine Learning, Decision Trees 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speakers: Adrian Alan Pol (Princeton University (US)), Savannah Jennifer Thais (Princeton University (US))
    • 15:00 15:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 15:30 17:30
      Machine Learning: Introduction to Deep Learning, Convolutional Neural Networks 2h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      With the vast amount of data and increasing computing power, the last decade saw an explosion of deep learning applications for real-world problems, especially when working with images. Deep learning is increasingly adopted in the high-energy physics field.

      We will go through the basics of neural networks: how do we train them and how do they make predictions. We will cover the basic building blocks of modern solutions. In the hands-on tutorial, you will learn how to use deep learning for jet tagging with high or low-level input data.

      Speakers: Adrian Alan Pol (Princeton University (US)), Savannah Jennifer Thais (Princeton University (US))
    • 18:30 20:30
      Social Mixer - Prospect House 2h

      Food and drinks at Prospect House

    • 08:00 08:30
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 08:30 10:00
      Machine Learning: Introduction to Graph Neural Networks 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speakers: Adrian Alan Pol (Princeton University (US)), Savannah Jennifer Thais (Princeton University (US))
    • 10:00 10:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 10:30 12:30
      Machine Learning: Unsupervised Machine Learning, Autoencoders 2h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Not all machine learning problems are created equal. Some tasks require working with no labels either to discover similarities between data points or to spot anomalies. Clustering is an important task of grouping similar data together. Dimensionality reduction helps with understanding the input space with a lot of input features. An autoencoder is a type of neural network that aims to learn the encoding of unlabeled data. They could be used for noise removal and dimensionality reduction. They can be useful to generate new data from arbitrary encoding. However, we need to learn the latent code distribution for that. This is where variational autoencoders come in handy.

      In this tutorial, you will write your own clustering algorithm, use a dimensionality reduction algorithm to visualize and understand the data and train a variational autoencoder to generate new data.

      Speakers: Adrian Alan Pol (Princeton University (US)), Savannah Jennifer Thais (Princeton University (US))
    • 12:30 13:30
      Lunch 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 13:30 15:00
      Columnar Data Analysis 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Data analysis languages, such as Numpy, MATLAB, R, IDL, and ADL, are typically interactive with an array-at-a-time interface. Instead of performing an entire analysis in a single loop, each step in the calculation is a separate pass, letting the user inspect distributions each step of the way.

      Unfortunately, these languages are limited to primitive data types: mostly numbers and booleans. Variable-length and nested data structures, such as different numbers of particles per event, don't fit this model. Fortunately, the model can be extended.

      This tutorial will introduce awkward-array, the concepts of columnar data structures, and how to use them in data analysis, such as computing combinatorics (quantities depending on combinations of particles) without any for loops.

      Speakers: Ioana Ifrim (Princeton University (US)), Jim Pivarski (Princeton University)
    • 15:00 15:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 15:30 17:30
      Columnar Data Analysis 2h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speakers: Ioana Ifrim (Princeton University (US)), Jim Pivarski (Princeton University)
    • 18:00 20:00
      Dinner on your own 2h
    • 08:00 08:30
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 08:30 10:30
      Parallel Programming - An introduction to parallel computing with OpenMP 2h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      We start with a discussion of the historical roots of parallel computing and how they appear in a modern context. We'll then use OpenMP and a series of hands-on exercises to explore the fundamental concepts behind parallel programming.

      Speaker: Tim Mattson (Intel)
    • 10:30 11:00
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 11:00 12:30
      Parallel Programming - The OpenMP Common Core 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      We will explore through hands-on exercises the common core of OpenMP; that is, the features of the API that most OpenMP programmers use in all their parallel programs. This will provide a foundation of understanding you can build on as you explore the more advanced features of OpenMP.

      Speaker: Tim Mattson (Intel)
    • 12:30 13:30
      Lunch 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 13:30 15:00
      Parallel Programming - Working with OpenMP 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      We'll explore more complex OpenMP problems and get a feel for how to work with OpenMP with real applications.

      Speaker: Tim Mattson (Intel)
    • 15:00 15:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 15:30 17:00
      Parallel Programming - The world beyond OpenMP 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Parallel programming is hard. There is no way to avoid that reality. We can mitigate these difficulties by focusing on the fundamental design patterns from which most parallel algorithms are constructed. Once mastered, these patterns make it much easier to understand how your problems map onto other parallel programming models. Hence for our last session on parallel programming, we'll review these essential design patterns as seen in OpenMP, and then show how they appear in cluster computing (with MPI) and GPGPU computing (with OpenCL and a bit of CUDA).

      Speaker: Tim Mattson (Intel)
    • 18:00 20:00
      School Dinner - Nassau Club 2h

      https://www.nassauclub.org

    • 08:30 09:00
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 09:00 09:45
      Things you didn't know you needed 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speakers: Henry Fredrick Schreiner (Princeton University), Kilian Lieret
    • 09:45 10:30
      Example Application: Line Segment Tracking 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Tres Reid (Cornell University (US))
    • 10:30 11:00
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 11:00 12:30
      Closing Session 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)