Second Computational and Data Science school for HEP (CoDaS-HEP 2018)
from
Monday 23 July 2018 (08:30)
to
Friday 27 July 2018 (13:00)
Monday 23 July 2018
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Welcome and Overview
-
Peter Elmer
(
Princeton University (US)
)
Welcome and Overview
Peter Elmer
(
Princeton University (US)
)
09:00 - 09:15
Room: 407 Jadwin Hall
09:15
Computational and Data Science Challenges
-
Matthieu Lefebvre
(
Princeton University (US)
)
Computational and Data Science Challenges
Matthieu Lefebvre
(
Princeton University (US)
)
09:15 - 10:00
Room: 407 Jadwin Hall
Including What Every Physicist Should Know About Computer Architecture...
10:00
Setup on local compute systems
Setup on local compute systems
10:00 - 10:30
Room: 407 Jadwin Hall
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Version Control with Git and Github
-
David Luet
(
Princeton University
)
Version Control with Git and Github
David Luet
(
Princeton University
)
11:00 - 12:30
Room: 407 Jadwin Hall
Fundamentaly, a Version Control System (VCS) is a system that records changes to a file or set of files over time, so that you can recall specific versions later. Git is a modern VCS that is fast and flexible to use thanks to its lightweight branch creation. Git is very popular, this is due in part to the availability of cloud hosting services like GitHub, Bitbucket and GitLab. Hosting a Git repositories on a remote service like GitHub greatly facilitates working collaboratively as well as allowing you to frequently backup your work on a remote host. We will start this talk by introducing the fundamental concepts of Git. The second part of the talk will show how to publish to a remote repository on GitHub. No prior knowledge of Git or version control will be necessary, but some familiarity with the Linux command line will be expected.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Parallel Programming - An introduction to parallel computing with OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - An introduction to parallel computing with OpenMP
Tim Mattson
(
Intel
)
13:30 - 15:00
Room: 407 Jadwin Hall
We start with a discussion of the historical roots of parallel computing and how they appear in a modern context. We'll then use OpenMP and a series of hands-on exercises to explore the fundamental concepts behind parallel programming.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Parallel Programming - The OpenMP Common Core
-
Tim Mattson
(
Intel
)
Parallel Programming - The OpenMP Common Core
Tim Mattson
(
Intel
)
15:30 - 17:30
Room: 407 Jadwin Hall
We will explore through hands-on exercises the common core of OpenMP; that is, the features of the API that most OpenMP programmers use in all their parallel programs. This will provide a foundation of understanding you can build on as you explore the more advanced features of OpenMP.
17:30
Welcome Reception
Welcome Reception
17:30 - 18:30
Tuesday 24 July 2018
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
Parallel Programming - Working with OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - Working with OpenMP
Tim Mattson
(
Intel
)
08:30 - 10:30
Room: 407 Jadwin Hall
We'll explore more complex OpenMP problems and get a feel for how to work with OpenMP with real applications.
10:30
Group Photo - Jadwin Hall plaza
Group Photo - Jadwin Hall plaza
10:30 - 10:40
10:40
Coffee Break
Coffee Break
10:40 - 11:00
Room: 407 Jadwin Hall
11:00
Parallel Programming - The world beyond OpenMP
-
Tim Mattson
(
Intel
)
Parallel Programming - The world beyond OpenMP
Tim Mattson
(
Intel
)
11:00 - 12:30
Room: 407 Jadwin Hall
Parallel programming is hard. There is no way to avoid that reality. We can mitigate these difficulties by focusing on the fundamental design patterns from which most parallel algorithms are constructed. Once mastered, these patterns make it much easier to understand how your problems map onto other parallel programming models. Hence for our last session on parallel programming, we'll review these essential design patterns as seen in OpenMP, and then show how they appear in cluster computing (with MPI) and GPGPU computing (with OpenCL and a bit of CUDA).
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Machine Learning
-
Alfredo Canziani
(
NYU Center for Data Science
)
Alexey Svyatkovskiy
(
Princeton University
)
Machine Learning
Alfredo Canziani
(
NYU Center for Data Science
)
Alexey Svyatkovskiy
(
Princeton University
)
13:30 - 15:00
Room: 407 Jadwin Hall
Machine learning (ML) is a thriving field with active research topics. It has found numerous practical applications in natural language processing, understanding of speech and images as well as fundamental sciences. ML approaches are capable of replicating and often surpassing the accuracy of hypothesis driven first-principles simulations and can provide new insights to a research problem. We here provide an overview about the content of the Machine Learning tutorials. Although the theory and practice sessions are described separately, they will be taught alternating one to the other, during the four lectures. In this way, after we’ve introduced new concepts, we can immediately use them in a tailored exercise, which will help us absorb the material covered. Theory We’ll start with a gentle introduction to the ML field, introducing the 3 learning paradigms: supervised, unsupervised, and reinforcement learning. We’ll then delve into the two different supervised sub-categories: regression and classification using neural nets’ forward and backward propagation. We'll face overfitting and fight it with regularisation. We'll soon see that smart choices can be done to exploit the nature of the data we're dealing with, and introduce convolutional, spectral, recurrent, and graph neural nets. We'll move on to unsupervised learning, and we'll familiarise with generative models, such as variational autoencoders and generative adversarial networks. Practice We will introduce machine learning technology focusing on the open source software stack, namely PyTorch and Keras frameworks. Brief introduction to PyTorch architecture, primitives and automatic differentiation, implementing multi-layer perceptron and convolutional layers, a deep dive into recurrent neural networks for sequence learning tasks. Introduction to Keras. Learn to debug machine learning applications and visualize training and validation process with pytorchviz or TensorBoard. Discuss ways to train multi-GPU and distributed models on a cluster with Horovod package. All exercises will use PyTorch or Keras. Python programming experience is desirable, but previous experience with PyTorch and Keras is not required.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Machine Learning
-
Alexey Svyatkovskiy
(
Princeton University
)
Alfredo Canziani
(
NYU Center for Data Science
)
Machine Learning
Alexey Svyatkovskiy
(
Princeton University
)
Alfredo Canziani
(
NYU Center for Data Science
)
15:30 - 16:15
Room: 407 Jadwin Hall
16:15
Computing for Big Science in the next Decade, plus HEP and Quantum Computing (Guest Lecture)
-
Elizabeth Sexton-Kennedy
(
Fermi National Accelerator Lab. (US)
)
Computing for Big Science in the next Decade, plus HEP and Quantum Computing (Guest Lecture)
Elizabeth Sexton-Kennedy
(
Fermi National Accelerator Lab. (US)
)
16:15 - 17:00
Room: 407 Jadwin Hall
18:30
Social Mixer - Prospect House
Social Mixer - Prospect House
18:30 - 20:30
Wednesday 25 July 2018
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
The Use and Abuse of Random Numbers
-
Daniel Sherman Riley
(
Cornell University (US)
)
The Use and Abuse of Random Numbers
Daniel Sherman Riley
(
Cornell University (US)
)
08:30 - 09:30
Room: 407 Jadwin Hall
09:30
Floating Point Arithmetic Is Not Real
-
Matthieu Lefebvre
(
Princeton University (US)
)
Floating Point Arithmetic Is Not Real
Matthieu Lefebvre
(
Princeton University (US)
)
09:30 - 10:30
Room: 407 Jadwin Hall
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
The Scientific Python Ecosystem
-
Jim Pivarski
(
Princeton University
)
The Scientific Python Ecosystem
Jim Pivarski
(
Princeton University
)
11:00 - 12:30
Room: 407 Jadwin Hall
In recent years, Python has become a glue language for scientific computing. Although code written in Python is generally slow, it has a good C API and Numpy as a common data abstraction, and many data processing, statistical, and most machine learning software packages have a Python interface as a matter of course. This tutorial will introduce you to core Python packages for science— Numpy, Pandas, SciPy, Numba, Dask— as well as HEP-specific tools— uproot, histbook, NumPythia, pyjet— and how to connect them in analysis code.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Machine Learning
-
Alexey Svyatkovskiy
(
Princeton University
)
Alfredo Canziani
(
NYU Center for Data Science
)
Machine Learning
Alexey Svyatkovskiy
(
Princeton University
)
Alfredo Canziani
(
NYU Center for Data Science
)
13:30 - 15:00
Room: 407 Jadwin Hall
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Machine Learning
-
Alexey Svyatkovskiy
(
Princeton University
)
Alfredo Canziani
(
NYU Center for Data Science
)
Machine Learning
Alexey Svyatkovskiy
(
Princeton University
)
Alfredo Canziani
(
NYU Center for Data Science
)
15:30 - 16:15
Room: 407 Jadwin Hall
16:15
Charged Particle Tracking Reconstruction (Guest Lecture)
-
Slava Krutelyov
(
Univ. of California San Diego (US)
)
Charged Particle Tracking Reconstruction (Guest Lecture)
Slava Krutelyov
(
Univ. of California San Diego (US)
)
16:15 - 17:00
Room: 407 Jadwin Hall
18:00
Dinner on your own
Dinner on your own
18:00 - 20:00
Thursday 26 July 2018
08:00
Breakfast
Breakfast
08:00 - 08:30
Room: 407 Jadwin Hall
08:30
Vector Parallelism for Kalman-Filter-Based Particle Tracking on Multi- and Many-Core Processors
-
Steven R Lantz
(
Cornell University (US)
)
Vector Parallelism for Kalman-Filter-Based Particle Tracking on Multi- and Many-Core Processors
Steven R Lantz
(
Cornell University (US)
)
08:30 - 10:30
Room: 407 Jadwin Hall
All modern CPUs boost their performance through vector processing units (VPUs). Typically this gain is achieved not by the programmer, but by the compiler through automatic vectorization of simple loops in the source code. Compilers generate SIMD instructions that operate on multiple numbers simultaneously by loading them together into extra-wide registers. Intel's latest processors feature a plethora of vector registers, as well as 1 or 2 VPUs per core that operate on 16 floats or 8 doubles in every cycle. Vectorization is an important component of parallel performance on CPUs, and to maximize performance, it is vital to consider how well one's code is being vectorized by the compiler. In the first part of our presentation, we look at simple code examples that illustrate how vectorization works and the crucial role of memory bandwidth in limiting the vector processing rate. What does it really take to reach the processor's nominal peak of floating-point performance? What can we learn from things like roofline analysis and compiler optimization reports? In the second part, we consider how a physics application may be restructured to take better advantage of vectorization. In particular, we focus on the Matriplex concept that is used to implement parallel Kalman filtering in our group's particle tracking R&D project. Drastic changes to data structures and loops were required to help the compiler find the SIMD opportunities in the algorithm. In certain places, vector operations were even enforced through calls to intrinsic functions. We examine a suite of test codes that helped to isolate the performance impact of the Matriplex class on the basic Kalman filter operations.
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Introduction to Performance Tuning & Optimization Tools
-
Ian Cosden
(
Princeton University
)
Introduction to Performance Tuning & Optimization Tools
Ian Cosden
(
Princeton University
)
11:00 - 12:30
Room: 407 Jadwin Hall
Improving the performance of scientific code is something that is often considered to be some combination of difficult, mysterious, and time consuming, but it doesn't have to be. Performance tuning and optimization tools can greatly aid in the evaluation and understanding of the performance of scientific code. In this talk we will discuss how to approach performance tuning and introduce some measurement tools to evaluate the performance of compiled-language (C/C++/Fortran) code. Powerful profiling tools, such as Intel VTune and Advisor, will be introduced as well as demonstrated in practical applications. A hands-on example will allow students to gain some familiarity using VTune in a simple, yet realistic setting. Some of the more advanced features of VTune, including the ability to access the performance hardware counters on modern CPUs, will be introduced.
12:30
Lunch
Lunch
12:30 - 13:30
Room: 407 Jadwin Hall
13:30
Low-level Python
-
Jim Pivarski
(
Princeton University
)
Low-level Python
Jim Pivarski
(
Princeton University
)
13:30 - 15:00
Room: 407 Jadwin Hall
Python is a high-level language that usually hides "bare metal" details from the user. This is desirable in organizing a complex workflow, but it can get in the way of performance or interfacing with C/C++ code. This tutorial will demonstrate how to "jailbreak" your Python for low-level computing. It will include Numpy tricks, memory mapped files, mixing C++ and Python through Cython, GPU programming through PyCUDA, and accessing ROOT functions from Python without loss of performance.
15:00
Coffee Break
Coffee Break
15:00 - 15:30
Room: 407 Jadwin Hall
15:30
Afternoon Session - Machine Learning
Afternoon Session - Machine Learning
15:30 - 16:15
Room: 407 Jadwin Hall
16:15
The role of machine learning in extracting the secrets of the Higgs (Guest Lecture)
-
Heather Gray
(
LBNL
)
The role of machine learning in extracting the secrets of the Higgs (Guest Lecture)
Heather Gray
(
LBNL
)
16:15 - 17:00
Room: 407 Jadwin Hall
Machine learning has transformed how many analyses are performed at the LHC. I will demonstrate this by showing how machine learning has been and is used in studying the selected properties of the Higgs boson. I will discuss selected example analyses and highlight the sensitivity and other improvements from machine learning. I will conclude by discussing limitations and future perspectives.
18:00
School Dinner - Despana (235A Nassau Street, corner of Nassau St and Olden)
School Dinner - Despana (235A Nassau Street, corner of Nassau St and Olden)
18:00 - 21:00
Friday 27 July 2018
08:30
Breakfast
Breakfast
08:30 - 09:00
Room: 407 Jadwin Hall
09:00
Machine Learning Topics
-
Alfredo Canziani
(
NYU Center for Data Science
)
Alexey Svyatkovskiy
(
Princeton University
)
Machine Learning Topics
Alfredo Canziani
(
NYU Center for Data Science
)
Alexey Svyatkovskiy
(
Princeton University
)
09:00 - 10:30
Room: 407 Jadwin Hall
10:30
Coffee Break
Coffee Break
10:30 - 11:00
Room: 407 Jadwin Hall
11:00
Closing Session
Closing Session
11:00 - 12:30
Room: 407 Jadwin Hall
12:30
Take-Away Lunch
Take-Away Lunch
12:30 - 13:00
Room: 407 Jadwin Hall