First Computational and Data Science school for HEP (CoDaS-HEP)

US/Eastern
407 Jadwin Hall (Princeton University)

407 Jadwin Hall

Princeton University

Princeton Center For Theoretical Science (PCTS)
Description

The first school on tools, techniques and methods for Computational and Data Science for High Energy Physics (CoDaS-HEP) will take place on 10-13 July, 2017, at Princeton University.

Advanced software is a critical ingredient to scientific research. Training young researchers in the latest tools and techniques is an essential part of developing the skills required for a successful career both in research and in industry.

The CoDaS-HEP school aims to provide a broad introduction to these critical skills as well as an overview of applications High Energy Physics. Specific topics to be covered at the school include:

  • Parallel Programming 
  • Big Data Tools and Techniques
  • Machine Learning 
  • Practical skills like performance evaluation, use of git, etc.

The school offers a limited number of young researchers an opportunity to learn these skills from experienced scientists and instructors. Successful applicants will receive travel and lodging support to attend the school.

School website: http://codas-hep.org

The school lectures will take place in 407 Jadwin Hall, in the main lecture hall of the Princeton Center for Theoretical Science (PCTS).

The draft timetable/program is online, further details will be added in the next weeks.

This project is supported by National Science Foundation grants PHY-1520942, PHY-1520969, PHY-1521042 and ACI-1450377 and by the Princeton Institute for Computational Science and Engineering (PICSciE). Any opinions, findings, conclusions or recommendations expressed in this material are those of the developers and do not necessarily reflect the views of the National Science Foundation.

 

    • 08:30 09:00
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 09:00 09:15
      Workshop Welcome and Overview 15m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Peter Elmer (Princeton University (US))
    • 09:15 10:00
      Computational and Data Science Challenges 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Including What Every Physicist Should Know About Computer Architecture...

      Speaker: Matthieu Lefebvre (Princeton University (US))
    • 10:00 10:30
      Setup on local compute systems 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 10:30 11:00
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 11:00 12:30
      Version Control with Git and Github 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Fundamentaly, a Version Control System (VCS) is a system that records changes to a file or set of files over time, so that you can recall specific versions later.

      Git is a modern VCS that is fast and flexible to use thanks to its
      lightweight branch creation. Git is very popular, this is due in part to the availability of cloud hosting services like GitHub, Bitbucket and GitLab. Hosting a Git repositories on a remote service like GitHub greatly facilitates working collaboratively as well as allowing you to frequently backup your work on a remote host.

      We will start this talk by introducing the fundamental concepts of Git. The second part of the talk will show how to publish to a remote repository on GitHub.

      No prior knowledge of Git or version control will be necessary, but some familiarity with the Linux command line will be expected.

      Speaker: David Luet (Princeton University)
    • 12:30 13:30
      Lunch 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 13:30 14:15
      Floating Point Arithmetic Is Not Real 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Matthieu Lefebvre (Princeton University (US))
    • 14:15 15:00
      The Use and Abuse of Random Numbers 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Daniel Sherman Riley (Cornell University (US))
    • 15:00 15:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 15:30 17:00
      Functional Programming for Data Analysis 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Even though both involve programming, data analysis is a different activity from software engineering, with different best practices.

      However, the recent interest in functional programming as a good practice for parallelization applies to data analysis as well. We'll mix theory (immutable data, structural sharing, combinators) with practical examples in Spark and (possibly) other functional programming-based analysis frameworks.

      Minimal prerequisites in specific language and framework knowledge (we'll introduce what we need), but a flexible mindset helps.

      Speaker: Jim Pivarski (Princeton University)
    • 17:30 18:30
      Welcome Reception 1h Lewis Science Library Atrium

      Lewis Science Library Atrium

    • 08:00 08:30
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 08:30 10:30
      Parallel Programming - An introduction to parallel computing with OpenMP 2h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      We start with a discussion of the historical roots of parallel computing and how they appear in a modern context. We'll then use OpenMP and a series of hands-on exercises to explore the fundamental concepts behind parallel programming.

      Speaker: Tim Mattson (Intel)
    • 10:30 10:40
      Group Photo - Jadwin Hall plaza 10m
    • 10:40 11:00
      Coffee Break 20m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 11:00 12:30
      Parallel Programming - The OpenMP Common Core 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      We will explore through hands-on exercises the common core of OpenMP; that is, the features of the API that most OpenMP programmers use in all their parallel programs. This will provide a foundation of understanding you can build on as you explore the more advanced features of OpenMP.

      Speaker: Tim Mattson (Intel)
    • 12:30 13:30
      Lunch 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 13:30 15:00
      Machine Learning Technology 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Machine learning (ML) is a thriving field with active research topics. It has found numerous practical applications in natural language processing, understanding of speech and images as well as fundamental sciences. ML approaches are capable of replicating and often surpassing the accuracy of hypothesis driven first-principles simulations and can provide new insights to a research problem.

      This session will introduce machine learning technology focusing on the open source software stack built around TensorFlow and Apache Spark frameworks.

      • Brief introduction to TensorFlow architecture and the primitives, implementing fully connected and convolutional layers, deep dive into higher-level APIs including tf.layers, estimators and Keras.
      • Learn to debug machine learning applications and visualize training and cross validation process with TensorBoard. Hands-on demo: debugging convolutional neural net. Discuss ways to train multi-GPU and distributed models on a cluster
      • Introduction to Spark transformations, actions, loading data into RDDs, DataFrames and Datasets, writing user-defined functions (UDF, UDAF). Discuss how to use Spark ML: transformers, estimators, pipeline. Creating your own UnaryTransformer

      All exercises will use a mix of TensorFlow (Python API), and PySpark, Spark ML (parts of Apache Spark). Python programming experience is desirable, but previous experience with Tensorflow, Spark or distributed computing is not required.

      Speaker: Alexey Svyatkovskiy (Princeton University)
    • 15:00 15:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 15:30 16:15
      Machine Learning Technology 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Alexey Svyatkovskiy (Princeton University)
    • 16:15 17:00
      Machine Learning for Neutrino Physics (Guest Lecture) 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Dr Kazuhiro Terao (SLAC)
    • 18:30 20:30
      Social Mixer - The Nassau Club 2h 6 Mercer Street, Princeton

      6 Mercer Street, Princeton

      Food and drinks at the Nassau Club

    • 08:00 08:30
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 08:30 10:30
      Parallel Programming - Working with OpenMP 2h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      We'll explore more complex OpenMP problems and get a feel for how to work with OpenMP with real applications.

      Speaker: Tim Mattson (Intel)
    • 10:30 11:00
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 11:00 12:30
      Parallel Programming - The world beyond OpenMP 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Parallel programming is hard. There is no way to avoid that reality. We can mitigate these difficulties by focusing on the fundamental design patterns from which most parallel algorithms are constructed. Once mastered, these patterns make it much easier to understand how your problems map onto other parallel programming models. Hence for our last session on parallel programming, we'll review these essential design patterns as seen in OpenMP, and then show how they appear in cluster computing (with MPI) and GPGPU computing (with OpenCL and a bit of CUDA).

      Speaker: Tim Mattson (Intel)
    • 12:30 13:30
      Lunch 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 13:30 15:00
      Machine Learning Methods 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      This session will cover the basics of machine learning and deep learning. We will discuss basic learning algorithms; overfitting and regularization; hyper parameter search (grid search, random search) and cross validation (stratified, k-fold); bias, variance trade-off and learning curves.

      • Supervised learning: decision trees and random forests, bootstrap aggregation and boosting;
        deep feed forward neural networks, forward propagation, back propagation, dropout regularization, Stochastic Gradient Descent; why training on mini-batches; brief introduction to convolution networks
      • Unsupervised learning: k-means clustering; locality sensitive hashing families, MinHash, Jaccard similarity. Case study: natural language processing

      Bonus: deep recurrent neural networks, unfolding through time, BPTT; LSTM. Case study: analyzing time series data of variable length.

      Speaker: Alexey Svyatkovskiy (Princeton University)
    • 15:00 15:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 15:30 16:15
      Machine Learning Methods 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Alexey Svyatkovskiy (Princeton University)
    • 16:15 17:00
      Parallel Charged Particle Tracking Reconstruction (Guest Lecture) 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
      Speaker: Peter Wittich (Cornell University (US))
    • 18:00 20:00
      Dinner on your own 2h
    • 08:00 08:30
      Breakfast 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 08:30 10:30
      Vector Parallelism for Kalman-Filter-Based Particle Tracking on Multi- and Many-Core Processors 2h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      All modern CPUs boost their performance through vector processing units (VPUs). Typically this gain is achieved not by the programmer, but by the compiler through automatic vectorization of simple loops in the source code. Compilers generate SIMD instructions that operate on multiple numbers simultaneously by loading them together into extra-wide registers. Intel's latest processors feature a plethora of vector registers, as well as 1 or 2 VPUs per core that operate on 16 floats or 8 doubles in every cycle. Vectorization is an important component of parallel performance on CPUs, and to maximize performance, it is vital to consider how well one's code is being vectorized by the compiler.

      In the first part of our presentation, we look at simple code examples that illustrate how vectorization works and the crucial role of memory bandwidth in limiting the vector processing rate. What does it really take to reach the processor's nominal peak of floating-point performance? What can we learn from things like roofline analysis and compiler optimization reports?

      In the second part, we consider how a physics application may be restructured to take better advantage of vectorization. In particular, we focus on the Matriplex concept that is used to implement parallel Kalman filtering in our group's particle tracking R&D project. Drastic changes to data structures and loops were required to help the compiler find the SIMD opportunities in the algorithm. In certain places, vector operations were even enforced through calls to intrinsic functions. We examine a suite of test codes that helped to isolate the performance impact of the Matriplex class on the basic Kalman filter operations.

      Speakers: Steven R Lantz (Cornell University (US)), Matevz Tadel (Univ. of California San Diego (US))
    • 10:30 11:00
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 11:00 12:30
      Introduction to Performance Tuning & Optimization Tools 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Improving the performance of scientific code is something that is often considered to be some combination of difficult, mysterious, and time consuming, but it doesn't have to be. Performance tuning and optimization tools can greatly aid in the evaluation and understanding of the performance of scientific code. In this talk we will discuss how to approach performance tuning and introduce some measurement tools to evaluate the performance of compiled-language (C/C++/Fortran) code. Powerful profiling tools, such as Intel VTune and Advisor, will be introduced as well as demonstrated in practical applications. A hands-on example will allow students to gain some familiarity using VTune in a simple, yet realistic setting. Some of the more advanced features of VTune, including the ability to access the performance hardware counters on modern CPUs, will be introduced.

      Speaker: Ian Cosden (Princeton University)
    • 12:30 13:30
      Lunch 1h 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 13:30 15:00
      Language Interoperability 1h 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)

      Getting data out of one language/framework and into another, mostly Python <--> C++ and Java <--> C++ or Python.

      Advanced Numpy tricks for writing into arrays in C++ libraries, into other processes via shared memory, allocating Numpy (and using Python) in NUMA environments, particularly KNL and GPU.

      Java to C++/CPython hooks: JNI, JNA, off-heap memory.

      On the fly compilation, which often factors into the above.

      Will assume familiarity with C/C++, Python, and Java. Since it's an overview of many similar topics, rather than a deep dive into any one of them, students will benefit if they only have familiarity and interest in two of the three languages.

      Speaker: Jim Pivarski (Princeton University)
    • 15:00 15:30
      Coffee Break 30m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 15:30 16:15
      Afternoon Session 45m 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 16:15 17:00
      Closing Session 407 Jadwin Hall

      407 Jadwin Hall

      Princeton University

      Princeton Center For Theoretical Science (PCTS)
    • 18:30 21:30
      School Dinner - Trattoria Procaccini 3h 354 Nassau Street, Princeton

      354 Nassau Street, Princeton

      Trattoria Procaccini