CERN School of Computing 2022

Europe/Warsaw
AGH University of Science and Technology

AGH University of Science and Technology

al. Mickiewicza 30 30-059 Kraków Poland
Agnieszka Dziurda (IFJ PAN), Sebastian Lopienski (CERN), Tomasz Szumlak (AGH), Joelma Tolomeo (CERN)
Description

Welcome to the 43rd CERN School of Computing (CSC 2022)! The school will take place from 4 to 17 September in the beautiful city of Kraków, Poland.

This year’s School is organized in collaboration with AGH University of Science and Technology (AGH), together with Institute of Nuclear Physics, Polish Academy of Sciences (IFJ PAN).

Academic programme

The two-week programme consists of more than 50 hours of lectures and hands-on exercises, covering three main themes: physics computing, software engineering, and data technologies. Students who pass the final optional exam will receive a diploma from CSC, as well as ECTS credits from AGH University.

Other activities

However, it's not all study; the social and sport programme is also a vital part of the School. We will have ample opportunities to explore and experience some of the great cultural, historical and natural attractions of Kraków and its region.

Applications are now closed.

Important dates

  • late March - applications open
  • Sunday 8 May (midnight UTC+2 / CEST) - deadline for applications
  • Thursday 9 June - invitations sent to the selected participants
  • Sunday 3 July - registration fee payment deadline
  • Sunday 4 September (afternoon/evening) - student arrivals at Novotel Kraków City West
  • Saturday 17 September (morning) - departure

Who can apply?

The School is aimed at postgraduate (ie. minimum of Bachelor degree or equivalent) students, engineers and scientists with a few years' experience in particle physics, in computing, or in related fields. We welcome applications from all countries and nationalities. Limited financial support may be available.

Registration
Self-presentation
CERN School of Computing
    • 3:00 PM 7:00 PM
      Arrival and registration at the hotel 4h hotel Novotel Krakow City West

      hotel Novotel Krakow City West

      For all participants of the CSC 2022

    • 7:30 PM 8:30 PM
      Dinner 1h hotel Novotel Krakow City West

      hotel Novotel Krakow City West

    • 9:00 AM 10:30 AM
      Opening Ceremony 1h 30m
    • 10:45 AM 11:15 AM
      Welcome coffee 30m
    • 11:15 AM 12:15 PM
      Introduction to Physics Computing L1: Hadron Collider Physics 1h

      Here we will focus on the physics of particle collisions, theoretical aspects of the standard model of particle physics, its predictive power as well as its shortcomings. Experimental aspects such as collider facilities and modern particle physics experiments as well as example physics questions and corresponding data analyses will be discussed. Furthermore, the compute models with the resulting amount of recorded data and simulated Monte Carlo events will be described.

      Speaker: Arnulf Quadt (University of Göttingen)
    • 12:15 PM 12:30 PM
      Announcements 15m
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 3:00 PM
      Tools and Techniques L1: Introduction 1h

      First, we discuss some of the characteristics of software projects for high energy physics, and some of the issues that arise when people want to contribute to them. We then continue with a brief introduction to software engineering from the perspective of the individual contributor, both as a formal process and how it actually affects what you do. We discuss the examples of unit testing and memory access problems.

      Speaker: Bob Jacobsen (UC Berkeley)
    • 3:00 PM 3:45 PM
      Self-presentation: 1 minute per person 45m
    • 3:45 PM 4:15 PM
      Coffee break 30m
    • 4:15 PM 5:00 PM
      Self-presentation: 1 minute per person 45m
    • 5:00 PM 6:00 PM
      Tools and Techniques L2: Tools for Collaboration, Software Engineering Across the Project 1h

      We continue the track with a discussion of system performance, and what you can (and can't) to affect it. We examine tools to help with that, discussing how they work and how they can mislead. We then discuss source control as a tool for collaboration. Using examples from basic to large and advanced, we show how individual choices can affect the building of large systems.

      Speaker: Bob Jacobsen (UC Berkeley)
    • 8:00 PM 9:30 PM
      Welcome dinner 1h 30m
    • 9:00 AM 10:00 AM
      Introduction to Physics Computing L2: Digital Data, Simulation and Reconstruction in Modern Particle Physics Experiments 1h

      Here, a focus will be placed on specific detector sub-components and their data readout concepts as well as data reconstruction techniques, simulation techniques and analysis techniques.

      Speaker: Arnulf Quadt (University of Göttingen)
    • 10:00 AM 11:00 AM
      Data Science L1: Tools for interactive data exploration 1h

      High energy physics has a rich history of interactive exploration of physics data, starting with tools like PAW and ROOT. The explosion of Data Science has created new tools for interactive exploration of large and ad-hoc datasets. This lecture introduces some of these, and shows how they can be used to find new and useful features starting with available data.

      Speaker: Bob Jacobsen (UC Berkeley)
    • 11:00 AM 11:30 AM
      Coffee break 30m
    • 11:30 AM 11:45 AM
      Announcements 15m
    • 11:45 AM 12:45 PM
      Data Science L2: Interactive exploration of non-numeric data 1h

      This lecture continues the exploration of interactive tools, using a learn-by-doing approach. It covers approaches for statistical simulation, geographical analysis, and textual data.

      Speaker: Bob Jacobsen (UC Berkeley)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 3:30 PM
      Tools and Techniques - exercises 1h 30m

      The exercises provide some direct experience with the tools and techniques described in the Lectures. Teams of two students will work together on examples designed to show the strengths and weaknesses of various tools and approaches. Basic and advanced exercises are available so that students can work at their own level.

      Speaker: Bob Jacobsen (UC Berkeley)
    • 3:30 PM 3:45 PM
      Coffee break 15m
    • 3:45 PM 5:00 PM
      1h 15m

      This is a continuation of the Tools and Techniques exercises

    • 8:00 PM 10:00 PM
      Special dinner and pub quiz 2h
    • 9:00 AM 10:00 AM
      Data Analysis L1: Introduction 1h

      In this lecture we will explain what are the main goals of data analysis. We will introduce statistics as the powerful mathematical tool for data analysis. We will define probability and random variables as key concepts in statistics for data analysis.

      Speaker: Toni Sculac (University of Split)
    • 10:00 AM 11:00 AM
      Data Management L1: Setting the scene: Storage technologies, Storage reliability 1h

      The lecture presents the various Storage Models, and the supporting management techniques.

      The lecture will then go in details on techniques to deliver arbitrary reliability and performance and discuss the solutions for long data preservation and trading between reliability, performances and costs.

      Speaker: Alberto Pace (CERN)
    • 11:00 AM 11:30 AM
      Coffee break 30m
    • 11:30 AM 11:45 AM
      Announcements 15m
    • 11:45 AM 12:45 PM
      Data Management L2: Cryptography, authentication, authorization and accounting 1 1h

      This lectures give elements of computer security that are relevant to data management. The lectures address the various cryptographic technologies used in data storage systems to ensure data encryption, integrity, confidentiality and access control. The Public Key infrastructure standard will be described as an example.

      Speaker: Alberto Pace (CERN)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 3:30 PM
      Student presentations 1h 30m
    • 3:30 PM 4:00 PM
      Coffee break 30m
    • 4:00 PM 6:00 PM
      Data Science - exercises 2h

      The exercises provide hands-on experience in three phases: First, we reiterate some examples from lecture to give basic experience. A set of intermediate exercises then extends that to some new problem areas. Finally, students can choose of one of several larger advanced problems to work through.

      Speaker: Bob Jacobsen (UC Berkeley)
    • 7:30 PM 8:30 PM
      Dinner 1h
    • 8:45 PM 10:00 PM
      Special evening session 1h 15m
    • 9:00 AM 10:00 AM
      Data Management L3: Cryptography, authentication, authorization and accounting 2 1h

      This lecture will continue the discussion on various authentication technologies and then move to authorization. Accounting will also be addressed.

      Speaker: Alberto Pace (CERN)
    • 10:00 AM 11:00 AM
      Data Analysis L2: Probability density functions and Monte Carlo methods 1h

      In this lecture we will discuss what probability density functions (PDFs) are, and what are their main properties. We will mention the most important PDFs and their properties both for discrete and continuous random variables. The importance of the Gaussian distribution lies in the Central Limit Theorem that will be discussed. Finally, we will discuss the concept of Monte Carlo methods and their usage in High Energy Physics and Data Analysis.

      Speaker: Toni Sculac (University of Split)
    • 11:00 AM 11:30 AM
      Coffee break 30m
    • 11:30 AM 11:45 AM
      Announcements 15m
    • 11:45 AM 12:45 PM
      Data Analysis L3: Parameter estimation and confidence intervals 1h

      In this lecture we will introduce the concept of test statistics and estimators. We will explain what are the key properties of a good estimator and how to obtain it using the Maximum Likelihood and Least Squares methods. We will define confidence intervals and make a strong statement on their statistical interpretation when discussing scientific results. Finally, we will learn how to derive confidence intervals for the Maximum likelihood and Least Squares methods.

      Speaker: Toni Sculac (University of Split)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 6:00 PM
      Free time 4h
    • 7:30 PM 8:30 PM
      Dinner 1h
    • 9:00 AM 10:00 AM
      Data Management L4: Distributed Hash Tables, Data Replication, Caching, Monitoring, Alarms and Quota 1 1h

      This lecture describes the various possible technologies used to implement distributed hash tables, data workflows and complex data transfer processes. It also discusses problems with data caching and Garbage Collection to conclude on monitoring and quota enforcement.

      Speaker: Alberto Pace (CERN)
    • 10:00 AM 11:00 AM
      Guest lecture: Heterogeneous computing 1h
      Speaker: Tomasz Szumlak (AGH University of Science and Technology (PL))
    • 11:00 AM 11:30 AM
      Coffee break 30m
    • 11:30 AM 11:45 AM
      Announcements 15m
    • 11:45 AM 12:45 PM
      Software Security L1: Introduction 1h

      The first lecture starts with a definition of computer security and an explanation of why it is so difficult to achieve. The lecture highlights the importance of proper threat modelling and risk assessment. It then presents three complementary methods of mitigating threats: protection, detection, reaction; and tries to prove that security through obscurity is not a good choice.

      Speaker: Sebastian Lopienski (CERN)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 3:00 PM
      Data Analysis L4: Hypothesis testing and p-value 1h

      We will learn about the hypothesis testing procedure and all of its key concepts. We will discuss how to choose a critical region and learn about errors of first and second kind. We will learn the importance of a blinded analysis and understand all the needed steps before looking at the data. Finally, we will discuss when can we announce a discovery in science and the concept of a p-value.

      Speaker: Toni Sculac (University of Split)
    • 3:00 PM 4:00 PM
      Data Analysis - exercises 1h

      There will be 3 sets of exercises covering basic properties of PDFs and Monte Carlo generators, Maximum Likelihood fit, and Hypothesis testing. Students will be given realistic but simplified problems where key concepts from statistics need to be applied in order to provide scientific interpretation of data. Each set of exercises consists of 5 problems that will help guide the student. Data is provided in a simple text file and can be analysed with any programming language that offers libraries for statistical analysis (Python or C++ are recommended).

      Speaker: Toni Sculac (University of Split)
    • 4:00 PM 4:15 PM
      Coffee break 15m
    • 4:15 PM 6:00 PM
      1h 45m

      This is a continuation of the Data Analysis exercises

    • 7:30 PM 8:30 PM
      Dinner 1h
    • 9:00 AM 11:00 AM
      (optional) CUDA training 2h

      Fundamentals of Accelerated Computing with CUDA C/C++

      This optional half-day course will allow you to learn how to accelerate and optimise existing C/C++ CPU-only applications to leverage the power of GPUs using innovative and modern CUDA techniques. It is also an excellent way to start working with highly optimised professional tools like Nsight integrated development environment with a graphical profiler. To start your journey with the massively parallel world, you are going to need a basic C/C++ competency, including familiarity with variable types, loops, functions, arrays, etc.

      This course, kindly organized by AGH University of Science and Technology (the hosting university), is offered to the participants of CSC 2022 for free (the usual fee is approximately 100 USD per person with a non-profit academic background). The promotion code which unlocks the materials and computation time in NVIDIA cloud will be given to you at the beginning of the course. The materials can be accessed and run in the cloud for approximately six months after the course. It is possible to get an official Certificate of Competency (CoC) issued by the NVIDIA after completing the exam session (at the end of the course day, or at any convenient time up to six months after the CSC 2022).

      Speaker: Tomasz Szumlak (AGH University of Science and Technology (PL))
    • 11:00 AM 11:15 AM
      Coffee break 15m
    • 11:15 AM 12:45 PM
      1h 30m

      This is a continuation of the CUDA training.

    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 4:30 PM
      2h 30m

      This is a continuation of the CUDA training.

    • 7:30 PM 8:30 PM
      Dinner 1h
    • 9:00 AM 9:00 PM
      Excursion to Wieliczka salt mine + guided visit to Kraków 12h
    • 9:00 AM 10:00 AM
      Machine Learning L1 1h
      Speakers: Kamila Kalecinska (AGH), Tomasz Szumlak (AGH)
    • 10:00 AM 11:00 AM
      Data Management L5: Distributed Hash Tables, Data Replication, Caching, Monitoring, Alarms and Quota 2 1h

      This lecture concludes the description of the various possible technologies used to implement distributed hash tables, data workflows and complex data transfer processes. It also discusses problems with data caching and Garbage Collection to conclude on monitoring and quota enforcement.

      Speaker: Alberto Pace (CERN)
    • 11:00 AM 11:30 AM
      Coffee break 30m
    • 11:30 AM 11:45 AM
      Announcements 15m
    • 11:45 AM 12:45 PM
      Software Security L2: Security in different phases of software development 1h

      The second lecture addresses the following question: how to create secure software? It introduces the main security principles (like least privilege, or defense in depth) and discusses security in different phases of the software development cycle. The emphasis is put on the implementation part: most common pitfalls and security bugs are listed, followed by advice on best practice for security development.

      Speaker: Sebastian Lopienski (CERN)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 2:15 PM
      CSC School Photo 15m
    • 2:15 PM 3:30 PM
      Software Security - exercises 1h 15m

      In the practice session, a range of typical security vulnerabilities will be presented. The goal is to learn how they can be exploited (for privilege escalation, data confidentiality compromise etc.), how to correct them, and how to avoid them in the first place! Students will be given small pieces of source code in different programming languages, and will be asked to find vulnerabilities and fix them. The online course documentation will gradually reveal more and more information to help students in this task. Additionally, students will have a chance to try several source code analysis tools, and see how such tools can help them find functionality bugs and security vulnerabilities.

      Speaker: Sebastian Lopienski (CERN)
    • 3:30 PM 3:45 PM
      Coffee break 15m
    • 3:45 PM 5:00 PM
      1h 15m

      This is a continuation of the Software Security exercises

    • 7:30 PM 8:30 PM
      Dinner 1h
    • 8:45 PM 10:00 PM
      Special evening talk: When Internet history meets philosophy 1h 15m
      Speaker: Francois Fluckiger
    • 9:00 AM 10:00 AM
      Software Design L1: Parallelism in a Modern HEP Data Processing Framework 1h

      Even though the miniaturization of transistors on chips continues like predicted by Moore's law, computer hardware starts to face scaling issues, so-called performance 'walls'. Probably, the best known is the 'power wall', which limits clock frequencies. Amongst others, a way of increasing processor performance remains now to integrate many cores in the same chip. At the same time, the upcoming LHC upgrade will increase the required CPU power drastically. Both problems challenge the current way of software design in high energy physics (HEP). Developers in high energy physics are forced to re-think their ways of software design and need to move to massively parallel applications. This lecture will explain the current HEP software design, the hardware and physics issues that need to be tackled, and possible approaches to achieve the required level of parallelization.

      Speaker: Stephan Hageboeck (CERN)
    • 10:00 AM 11:00 AM
      Machine Learning L2 1h
      Speakers: Kamila Kalecinska (AGH), Tomasz Szumlak (AGH)
    • 11:00 AM 11:30 AM
      Coffee break 30m
    • 11:30 AM 11:45 AM
      Announcements 15m
    • 11:45 AM 12:45 PM
      Data Technologies: Introduction 1h

      The lecture will introduce the basic concepts of IO systems, protocols and data storage models as a preparation to the data technology exercises.

      Speaker: Andreas Joachim Peters (CERN)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 3:30 PM
      Data Technologies - exercises 1h 30m

      The first part of hands-on exercises aims to improve understanding of basic parameters in IO systems:
      • network and media latency
      • access patterns
      • OS caching
      • bottlenecks and optimization strategies for local and remote data access.
      Few essential Linux tools will be introduced to monitor and measure IO performance avoiding bias introduced by OS caching. Students will experience and measure the impact of latency and access patterns on IO performance.
      The second part covers the concept of parallelism and redundancy in storage system. We will apply the technology of Cloud storage systems to store and retrieve files in our local desktop cluster using a distributed hash table to locate files or file fragments and a REST interface to do GET, PUT or DELETE operations on these.
      The exercises conclude with the implementation and performance tuning of a RAID verification algorithm.

      Speaker: Andreas Joachim Peters (CERN)
    • 3:30 PM 3:45 PM
      Coffee break 15m
    • 3:45 PM 5:00 PM
      1h 15m

      This is a continuation of the Data Technologies exercises.

    • 5:00 PM 7:00 PM
      Traditional CSC football match 2h
    • 7:30 PM 8:30 PM
      Dinner 1h
    • 9:00 AM 10:00 AM
      Software Design L2: Base Concepts of Parallel Programming: A Pragmatic Approach 1h

      This and the following lecture will explain the concepts behind various parallelization methodologies.
      First, a theoretical introduction into threads, thread-safety and concurrent data access will be given. As the new C++ standard (C++11) now provides build-in support for parallel programming, the new features of this standard will be shown. Finally, concrete solutions for the theoretical problems will be discussed.

      Speaker: Andrei Gheata (CERN)
    • 10:00 AM 11:00 AM
      Machine Learning L3 1h
      Speakers: Kamila Kalecinska (AGH), Tomasz Szumlak (AGH)
    • 11:00 AM 11:30 AM
      Coffee break 30m
    • 11:30 AM 11:45 AM
      Announcements 15m
    • 11:45 AM 12:45 PM
      Software Security L3: Web application security, exercise debriefing 1h

      This third hour consists of a debriefing of the exercises, and in particular those web-related. Various vulnerabilities typical to web applications (such as Cross-site scripting, SQL injection, cross-site request forgery etc.) are introduced and discussed.

      Speaker: Sebastian Lopienski (CERN)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 3:30 PM
      Machine Learning - exercises 1h 30m
      Speakers: Kamila Kalecinska (AGH), Tomasz Szumlak (AGH)
    • 3:30 PM 3:45 PM
      Coffee break 15m
    • 3:45 PM 5:00 PM
      1h 15m

      This is a continuation of the Machine Learning exercises

    • 7:30 PM 8:30 PM
      Dinner 1h
    • 9:00 AM 10:00 AM
      Software Design L3: Understanding, Debugging and Profiling a Complex Multithreaded Application 1h

      Dealing with a parallel application is complex. We need to use procedures to rise fences to protect against mistakes, like static analysis tools allowing to find bugs in an automatic way. We also need to use tools to inspect and manipulate the behavior of programs at runtime, like the GDB debugger. Finally, profilers such as igprof can help us understand the performance bottlenecks of an application and get more insight on its efficiency. The objective of this lecture is to become familiar with these tools and be able to apply them in multithreaded programs.

      Speaker: Andrei Gheata (CERN)
    • 10:00 AM 11:00 AM
      Software Design L4: Patterns for Parallel Software Development 1h

      This lecture will present a set of common patterns in parallel programming. The sequential origin of these patterns will be discussed, as well as the restrictions that they impose. A particularly successful combination of patterns, Map-Reduce, will be described in detail and examples of its everyday use at large scale will be given. On the other hand, it will be shown how high-level features like C++ lambdas, the TBB library or the Spark framework can help get started with the aforementioned parallel patterns.

      Speaker: Stephan Hageboeck (CERN)
    • 11:00 AM 11:30 AM
      Coffee break 30m
    • 11:30 AM 11:45 AM
      Announcements 15m
    • 11:45 AM 12:45 PM
      Data Visualization L1: The Theory Behind Data Visualization 1h

      In this lecture, we introduce the basic concepts behind data visualization, what we are visualizing, why we are visualizing it, and how we can visualize data more effectively.

      Speaker: Eamonn Maguire (Proton)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:00 PM 3:30 PM
      Software Design - exercises 1h 30m

      The exercises will cover the topics of lectures 1 to 4 at a hands-on basis, based on C++11, TBB and Spark. It covers examples for the new C++11 functionality related to threads and thread safety. In addition, there will be examples for concurrent access to data, lock and lock-free data structures, and task based programming.  Finally, there will be an exercise to practise the Map-Reduce pattern by using the Spark parallel data processing framework.

      Speakers: Andrei Gheata (CERN), Stephan Hageboeck (CERN)
    • 3:30 PM 3:45 PM
      Coffee break 15m
    • 3:45 PM 5:00 PM
      1h 15m

      This is a continuation of the Software Design exercises.

    • 7:30 PM 8:30 PM
      Dinner 1h
    • 9:00 AM 10:00 AM
      Data Visualization L2: Practical Applications of Theory and Multi-Dimensional Data Visualization 1h

      In this lecture we apply some of what we learned in Lecture 1 and also introduce the visualization of multi-dimensional data.

      Speaker: Eamonn Maguire (Proton)
    • 10:00 AM 10:15 AM
      Announcements 15m
    • 10:15 AM 10:45 AM
      Coffee break 30m
    • 10:45 AM 12:45 PM
      Data Visualization - exercises 2h
      Speaker: Eamonn Maguire (Proton)
    • 1:00 PM 1:45 PM
      Lunch 45m
    • 2:30 PM 3:30 PM
      CSC examination 1h
    • 3:30 PM 4:00 PM
      Coffee break 30m
    • 4:15 PM 5:45 PM
      Graduation and closing ceremony 1h 30m
    • 8:00 PM 10:00 PM
      Closing dinner 2h
    • 9:00 AM 10:30 AM
      Departure 1h 30m