The 14th Inverted CERN School of Computing (iCSC 2023) consists of classes (lectures, exercises, demonstration and consultations) given by former CERN School of Computing students. The Inverted School provides a platform to share their knowledge by turning students into teachers. More information on the Inverted CSC events can be found at https://csc.web.cern.ch/schools/invertedschool/.
The school will take place on March 69, 2023 as a hybrid event  at CERN and on Zoom. The event will be recorded.
Registrations are closed.



Track fitting is an everyday repetitive task in the high energy physics detector reconstruction chains. The precision and stability of the fitter depend on the available computing resources. A fit might cost up to half of the CPU time, that is spent on reconstruction. Kalman filters are a widespread solution for the track fitting. A classical Kalman filter is a powerful tool, that is applicable to the linear problems with Gaussianlike errors. However, in reality one has to deal with nonlinear problems and sometimes with nonGaussian errors. The numerical overheat results in instabilities and slows down the convergence. Physics and reparametrisation can help to improve the fit performance. Starting from the simple Kalman filter, we build up a more realistic Kalman filter, discussing practical tricks and possible issues of implementation. We then talk about implementation differences if using CPU or GPU.
These days, the "cloud" is the default environment for deploying new applications.
Frequently cited benefits are lower cost, greater elasticity and less maintenance overhead.
However, for many people "using the cloud" means following obscure deployment steps that might seem like black magic.
This course aims to make newcomers familiar with cloudnative technology (building container images, deploying applications on Kubernetes etc.) as well as explain the fundamental concepts of the tech (microservices, separation of concerns and least privileges, fault tolerance).
In particular, the following topics of application development will be
covered:
BUILDING; writing applications in a cloudnative way (e.g. to work in an immutable environment) and creating container images according to bestpractices;
DEPLOYING; using infrastructureascode to describe the application deployment (e.g. Helm charts) and using advanced features such as rolling updates and autoscaling;
MONITORING; after multiple containers have been deployed, it is important to keep track of their status and the interaction between the services.
Message passing is a technique which allows to implement very performant processing software by splitting computation in pipelines and parallel nodes. However, with the great scalability comes the cost of complexity which might make such a system difficult to understand, develop and maintain. The lecture will cover the basic principles of message passing in data processing systems and typical problems that may occur when implementing and using such kind of software.
Track fitting is an everyday repetitive task in the high energy physics detector reconstruction chains. The precision and stability of the fitter depend on the available computing resources. A fit might cost up to half of the CPU time, that is spent on reconstruction. Kalman filters are a widespread solution for the track fitting. A classical Kalman filter is a powerful tool, that is applicable to the linear problems with Gaussianlike errors. However, in reality one has to deal with nonlinear problems and sometimes with nonGaussian errors. The numerical overheat results in instabilities and slows down the convergence. Physics and reparametrisation can help to improve the fit performance. Starting from the simple Kalman filter, we build up a more realistic Kalman filter, discussing practical tricks and possible issues of implementation. We then talk about implementation differences if using CPU or GPU.
In these two lectures, we start from the points on planes and follow the entire trackfitting chain up to the highlevel particle parameters. We discuss the connection between the geometry of the detector and the track model, as well as, the trackfitting chain. We also discuss physicsdriven optimization of the algorithms based on the effect of the changes on the highlevel parameters.
In the end, we discuss possible implementations of track fitting on CPU and GPU, highlighting the importance of a tradeoff between speed and precision.
This exercise prerequisites are:
This exercise prerequisites are:
This lecture will introduce the concepts of authentication and authorisation and their importance to modern research infrastructures. This will then be built upon by providing an overview of the existing WLCG authentication and authorisation infrastructure (AAI), before taking a deeper look at the token based AAI the grid is currently transitioning towards, covering the motivations for change, the technologies underpinning the design, and key workflows.
The exercise class for this lecture will provide attendees with the opportunity to obtain tokens from an issuer, and then extract information from the token. This will build upon concepts from the lecture and give handson experience with the technologies underpinning the future of the WLCG AAI.
Supervised and unsupervised machine learning has shown great performance in finding mappings between probability distributions, as e.g. in classification problems or for artificial data generation. A more difficult class of problems is decisionmaking, e.g. controlling dynamical systems or building mathematical algorithms because the framework requires additional timeordering. Reinforcement learning (RL) was successful in solving such problems, e.g. in finding strategies for games, optimizing algorithms for highperformance computing, and controlling magnetic fields for nuclear fusion reactors and particle accelerators. In this lecture, I will provide an introduction to the framework, with pedagogical examples, mathematical details, and applications in particle physics. In detail, I will cover: 1) Markov decision processes (MDPs) as the mathematical foundation of RL; 2) Solving small MPDs with tabular methods; 3) Solving large MDPs with policy gradient methods.
This exercise prerequisites are:
The use of hardware accelerators in High Energy Physics (HEP) is becoming increasingly popular since they are able to significantly reduce the computational time and CPU resources needed for processing and analyzing data. This lecture aims to familiarize the audience with the concept of hardware accelerators and parallel programming. In the first part of the lecture, the concept of accelerators, coprocessors and heterogeneity will be discussed, with a focus on the Graphical Processing Unit (GPU). An overview of some of the current applications of GPUs in HEP will also be presented. The second part of the lecture will serve as an introduction to CUDA, a programming model designed for general computing on GPUs.
Domain : Parallel programming
NonEuclidean data structures are present everywhere in the physical and digital world. Over the last few years, an increasing number of scientific fields have started to leverage the information contained in such data structures with the advent of Geometric Deep Learning. This is also true for High Energy Physics, where Graph Neural Networks are nowadays developed and used for various tasks in different reconstruction steps.
In this lecture we will first demonstrate the expressive power of graphs as a data structure and introduce the fundamental concepts of graph theory. Then we will discuss Graph Neural Networks and lay the mathematical foundation of the most important neural mechanisms such as Neural Message Passing or Graph Convolution. Lastly we will examine applications of Graph Neural Networks in High Energy Physics that make use of the aforementioned technologies.
This lecture aims at the particle physicist who approaches Graph Neural Networks as a practitioner. The main objectives are to illustrate the reasons that Graph Neural Networks are powerful deep learning tools and to present the minimum knowledge needed to conduct research in the computer science literature and apply established technologies to HEP.
The C++ language is widely used for stateoftheart physics analysis code. Source code must be compiled before it can be executed, which involves a number of steps. Although compiler theory is taught in most undergraduate CS courses, realworld compilers carry an aura of mysterious, highly complex software products.
This lecture aims to uncover some of those secrets by feeding snippets of C++ code to a compiler, illustrating the different processing steps and dissecting the internal representations, from source to a final binary.
This exercise prerequisites are:
In this course the students can learn how to write platform agnostic code using Python (and some C). Some knowledge (~1 year experience) of these two languages is recommended.
The lecture will focus on how Python can easily be combined with C for CPU and GPU programming, by exploiting the advantages of both languages. The goal is to introduce 3 Python libraries that are used at CERN (e.g. in modern multiparticle simulation frameworks): CFFI, CuPy and PyOpenCL. CFFI is a library for PythonC interfacing and CPU kernel execution. CuPy and PyOpenCL are libraries for kernel execution compatible with GPUs. Additionally, there will be a short review of heterogeneous programming and a comparison of the CUDA and OpenCL programming models.
In a subsequent tutorial session the students will be able to play around with these Python libraries.
MLOps  Going from Good to Great
To build a highlyperformant machine learning model is not a small feat. The process requires a wellcurated dataset, a suitable algorithm as well as finely tuned hyperparameters of the very algorithm. Once an ML model reaches a certain degree of maturity and is shared with a broader user base, a new set of operational challenges come to play. The growing field of MLOps addresses these challenges to ease the friction related to model distribution. In this lecture and exercise session, we will explore and practice main MLOps aspects, including but not limited to:
1. Selection and versioning of training datasets
2. Reproducibility of models and computing environments
3. Model encapsulation with HTTP API
4. Model versioning and rollout strategies
5. Monitoring of model performance and its drift over time
This exercise prerequisites are:
This will be a general overview of quantum computing and what’s special about it spanning two lectures (2 hours) and two practice sessions (2 hours). The mathematical and physics basis will be covered (not extensively). There will be a discussion of the prospects, with an emphasis on High Energy Physics. There will be a brush over the shortcomings of quantum computing and the common misrepresentation of facts about the status of the field. The practice sessions will involve using the Qiskit and Pennylane frameworks. The aim of this mini course is to inspire the students to learn more about the subject and cautiously hype them up to be interested in the CERN quantum technology initiative or getting involved with quantum technologies in general.
This will be a general overview of quantum computing and what’s special about it spanning two lectures (2 hours) and two practice sessions (2 hours). The mathematical and physics basis will be covered (not extensively). There will be a discussion of the prospects, with an emphasis on High Energy Physics. There will be a brush over the shortcomings of quantum computing and the common misrepresentation of facts about the status of the field. The practice sessions will involve using the Qiskit and Pennylane frameworks. The aim of this mini course is to inspire the students to learn more about the subject and cautiously hype them up to be interested in the CERN quantum technology initiative or getting involved with quantum technologies in general.
This exercise prerequisites are:
The Large Hadron Collider (LHC) at CERN has generated a vast amount of information from physics events, reaching peaks of TB of data per day. Many reports show that the current analysis models (and more generally, data processing interfaces) would not be able to efficiently accommodate the amount of data in the next few years. It is both the responsibility of the frameworks to provide efficient computing tools and the user's responsibility to optimally exploit these resources. The latter is of particular interest in this lecture.
The purpose of this talk is to familiarize students with mechanisms to efficiently profile the performance of C++ and Python applications, going through realworld HEP analysis. The core of the lecture will be the identification of hotspots via perf
and techniques for mitigation of different kinds of bottlenecks.
Over the past few years, many advances in the field of Deep Learning (DL) have been achieved and nowadays modern DL models are starting to be deployed in our everyday life. However, for many safetycritical applications, as long as scientific research fields, the quantification of the uncertainty of DL model predictions plays a crucial role.
In this lecture, I will introduce the basics of Bayesian Neural Networks, how they can tackle the problem of estimating model uncertainty, and the most common techniques for generalizing this method to deep neural networks.
This exercise prerequisites are:
This exercise prerequisites are: