Inverted CERN School of Computing 2025

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

The Inverted CSC (iCSC) is a 4 day long school, organised at CERN. It consists of lectures and hands-on exercises on a variety of topics, given by former CSC students. This school provides the students with a platform to share their knowledge – and by doing so, it effectively “inverts” the roles by turning students into teachers.

This year's iCSC will include topics on Machine Learning and LLMs, HEP Computing, Best Coding Practices, Parallel computing and many more.

The school will take place on March 24-27, 2025 as a hybrid event - at CERN and on Zoom. All lecturers will be included in the Zoom event, however for the hands on exercises no Zoom will be available.

To access and download slides, go to:
https://indico.cern.ch/event/1468713/timetable/?layout=room#20250324.detailed
Click on the paperclip icon, which will give you access to a pptx and pdf version of the slides.

The school is free of charge and participants from outside of CERN are welcome to take part as well. We will not arrange for any accommodation or travel for participants, nor catering (a part coffee breaks) during the event.

Registration is now closed for the iCSC! If you are already at CERN you are of course welcome to pop in and listen to any of the talks of your choice!


By registering you are not obliged to stay for all lectures, you can attend the ones that interest you the most.

 

CERN School of Computing
Webcast
There is a live webcast for this event
Zoom Meeting ID
69276771372
Host
Andrzej Nowicki
Alternative host
Benoit Loyer
Passcode
32524906
Useful links
Join via phone
Zoom URL
    • 1
      Welcome to the Inverted School of Computing 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
      Speaker: Alberto Pace (CERN)
    • 2
      Federated Learning with CAFEIN for Decentralized, Privacy preserving and Secure AI development 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Federated Learning represents a paradigm shift in artificial intelligence by allowing the training of machine learning algorithms without the need to transfer data or rely on centralized resources. FL's decentralized computing and data storage nature ensures privacy and regulatory compliance while enhancing scalability and robustness compared to traditional systems.
      This talk will provide an introduction to the fundamentals of federated learning, the federated process across vertical and horizontal federations, and aggregation algorithms. We will then delve into key security challenges, including network security for secure communication and model security to prevent adversarial attacks and leakage of sensitive information.
      The discussion will also feature CAFEIN, CERN's federated learning platform, showcasing real-world projects both at CERN and their application in society through industry and academic collaborations.

      Speaker: Diogo Reis Santos (CERN)
    • 3
      Understanding Large Language Models and their Applications in Code Generation 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Lecture

      The field of Deep Learning has experienced a significant turning point in the past years, driven by the emergence and rapid development of Large Language Models (LLMs). These advanced models have not only redefined standards in Natural Language Processing (NLP) but are increasingly being integrated into applications and services due to their natural language capabilities.

      Interest in using LLMs for coding has grown rapidly, and some companies have sought to transform natural language processing into code generation. However, this trend has already exposed several unresolved challenges in applying LLMs to coding. Despite these difficulties, it has spurred the development of AI-powered code generation tools, such as GitHub Copilot.

      In this lecture, we will introduce Large Language Models and explore the related terminology and core components. We will examine the process of creating an LLM from scratch and highlight the requirements for developing domain-specific LLMs, such as those for code generation. Additionally, we will review the initial footprints of these technologies in code generation and discuss their limitations in this domain. We will also explore strategies and architectural approaches that improve the performance of LLMs generally and for code generation specifically.

      We will conclude by addressing the ethical concerns surrounding LLMs, such as the authorship and ownership of AI-generated code. Finally, we will explore other applications of LLMs in science, particularly within the High Energy Physics community and at CERN.

      Hands-on

      I think it could be interesting (but optional) to have a hands-on session, including:


      1. Some exercises to interact with an LLM from Python (either from a small pre-trained model in PyTorch or through API calls).
      2. Some exercises to explore domain-specific models for code generation.
      3. Hands-on exploration and testing of the LLMs components.
      4. Prompt Engineering strategies or small Fine-Tuning to improve the performance of a model.
      5. Some simplified strategies such as Retrieval Augmented Generation could also be implemented.
      Speaker: Andrea Valenzuela Ramirez (CERN)
    • 11:15
      Coffee and networking 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
    • 4
      LLMs in Production: RAG pipelines and beyond 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Running Large Language Models (LLMs) in production presents lots of complexities extending far beyond your choice in model. Key challenges include:

      • How do you address knowledge staleness (i.e. your model being trained on out of date / not relevant information)?
      • How do you balance cost optimisation with model latency?
      • How do you reduce bias and factual hallucinations?

      A widely adopted approach to address these is Retrieval Augmented Generation (RAG).

      RAG pipelines implement a two-tiered approach (Retrieval & Generation): allowing models to be given domain-specific information prior to generating their response to a question. Through these techniques, "Off the Shelf" LLMs can be applied to a much wider domain context than what they were originally trained for.

      In this lecture, we will explore how to improve the adaptability of LLMs without the need for fine-tuning: covering RAG and related architectures, physics-based approaches like entropix that allow for self-reasoning / context aware sampling, and the challenges with applying these techniques in a production context.

      Speaker: Jack Charlie Munday (CERN)
    • 12:40
      Lunch and networking

      Grab your lunch in R2 and continue discussing over lunch!

    • 5
      The Algorithm Advantage: Outperforming Hardware with Smarter Code 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      When tackling software performance, it's easy to prioritize hardware optimizations like CPU multithreading or GPU programming. However, a well-chosen algorithm often delivers more significant improvements than any hardware adjustment.
      In this lecture, we will begin by demystifying Big-O Notation, a cornerstone for evaluating algorithm efficiency. From there, we will explore algorithms tailored for array operations, starting with a comparative analysis of popular sorting techniques. Their strengths, weaknesses, and use cases will be highlighted to provide practical insights.
      Next, we will shift focus to the versatile "two pointers" technique, a powerful paradigm for solving complex problems involving dynamic data structures efficiently.
      This session is technology-agnostic, offering valuable takeaways for anyone working with dynamic data structures, regardless of their programming language or technology stack. Whether you are a physicist analyzing data or a developer optimizing code, this lecture will equip you with foundational tools for smarter problem-solving.

      Speaker: Andrea Germinario
    • 6
      WASM: the future of computing 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      WASM (WebAssembly) is a technology that defines a standard binary format that can run anywere, disregarding of the architecture or the operative sistem.
      Started as a technology to compile code for the browser (as the name suggest), WASM is expanding far beyond its original usecase. Its capability of running workloads on any hardware (x86, ARM, GPU, NPU) and any operative system (Linux, Windows, Browsers) makes it ideal for scientific computing, LLMs, blockchain, IoT devices. Furthermore, WASM isolation techniques are slowly replacing containers.
      In this series of lectures, we will first give a look at WASM technology and use cases and then try it out with hands-on exercises.

      Speaker: Alberto Pimpo
    • 16:00
      Coffee and networking 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
    • 7
      Exercise: The Algorithm Advantage: Outperforming Hardware with Smarter Code 513-1-024

      513-1-024

      CERN

      When tackling software performance, it's easy to prioritize hardware optimizations like CPU multithreading or GPU programming. However, a well-chosen algorithm often delivers more significant improvements than any hardware adjustment.
      In this lecture, we will begin by demystifying Big-O Notation, a cornerstone for evaluating algorithm efficiency. From there, we will explore algorithms tailored for array operations, starting with a comparative analysis of popular sorting techniques. Their strengths, weaknesses, and use cases will be highlighted to provide practical insights.
      Next, we will shift focus to the versatile "two pointers" technique, a powerful paradigm for solving complex problems involving dynamic data structures efficiently.
      This session is technology-agnostic, offering valuable takeaways for anyone working with dynamic data structures, regardless of their programming language or technology stack. Whether you are a physicist analyzing data or a developer optimizing code, this lecture will equip you with foundational tools for smarter problem-solving.

      Speaker: Andrea Germinario
    • 8
      Exercise: WASM: the future of computing 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      WASM (WebAssembly) is a technology that defines a standard binary format that can run anywere, disregarding of the architecture or the operative sistem.
      Started as a technology to compile code for the browser (as the name suggest), WASM is expanding far beyond its original usecase. Its capability of running workloads on any hardware (x86, ARM, GPU, NPU) and any operative system (Linux, Windows, Browsers) makes it ideal for scientific computing, LLMs, blockchain, IoT devices. Furthermore, WASM isolation techniques are slowly replacing containers.
      In this series of lectures, we will first give a look at WASM technology and use cases and then try it out with hands-on exercises.

      Speaker: Alberto Pimpo
    • 9
      Under the Hood of the Snake: Behind the scenes of Python 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Python is one of the most used programming languages in scientific computing - yet when discussed, we often admonish: "Python is slow!"

      What does that mean in practice? Does it depend on the use case? Most importantly: what is Python doing under the hood?

      Let's put our fangs into the inner workings of Python. I will talk about its design and what "Python is slow" means, what we are comparing to, and when this matters. Through examples we will see these things in action - both where the language is limited, and where the limitation might be your code...

      Python is very flexible, which comes with trade-offs compared to languages like C++. However, "slow" is not an absolute term. There are good practices that can be applied to speed up our programs, saving both time and computing resources. With the right techniques, we can learn to charm the snake!

      Speaker: Sten Astrand (Lund University)
    • 10
      Reinforcement Learning in Particle Accelerators: a practical example (1/2) 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Proposal: Reinforcement Learning for Particle Accelerator control: A real-world example

      Hour 1: Introduction to Reinforcement Learning for Particle Accelerators

      • Basic Concepts:
      • Overview of Reinforcement Learning (RL) fundamentals.
      • Definitions and distinctions:
        • Model-free vs. model-based.
        • Off-policy vs. on-policy approaches.
      • Applications and Considerations:
      • Discussion of problem types and environmental variables affecting model selection in practical scenarios.
      • Analysis of drawbacks and benefits of different RL architectures.
      • Practical examples:
      • Real-world examples of RL in particle accelerators (e.g., CERN).
      • Case study introduction: Optimization of RF triple splittings in the Proton Synchrotron (PS).

      Hour 2: Optimizing RF Triple Splittings with Reinforcement Learning

      • Problem Definition:
      • Explanation of PS RF operations and the triple splitting optimization challenge for LHC-type beams.
      • Overview of the physics and parameters involved in optimization.
      • Optimization Approach:
      • Justification for choosing RL and specific RL architectures.
      • Step-by-step walkthrough:
        • Initial simulations and trials.
        • Challenges and lessons learned.
        • Final operational solution deployed in the control room.

      Exercise Session: Training RL Agents for RF Optimization (1 hour)

      • Objective:
      • Train RL agents to optimize RF double splitting settings in simulation for improved beam quality.
      • Implementation:
      • Use SWAN notebooks with provided skeleton code.
      • Define a custom gymnasium environment for the double splitting problem, given:
        • Pre-implemented simulation data loaders.
        • Basic loss function for optimization.
      Speaker: Joel Axel Wulff
    • 11:00
      Coffee and networking 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
    • 11
      Reinforcement Learning in Particle Accelerators: a practical example (2/2) 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Proposal: Reinforcement Learning for Particle Accelerator control: A real-world example

      Hour 1: Introduction to Reinforcement Learning for Particle Accelerators

      • Basic Concepts:
      • Overview of Reinforcement Learning (RL) fundamentals.
      • Definitions and distinctions:
        • Model-free vs. model-based.
        • Off-policy vs. on-policy approaches.
      • Applications and Considerations:
      • Discussion of problem types and environmental variables affecting model selection in practical scenarios.
      • Analysis of drawbacks and benefits of different RL architectures.
      • Practical examples:
      • Real-world examples of RL in particle accelerators (e.g., CERN).
      • Case study introduction: Optimization of RF triple splittings in the Proton Synchrotron (PS).

      Hour 2: Optimizing RF Triple Splittings with Reinforcement Learning

      • Problem Definition:
      • Explanation of PS RF operations and the triple splitting optimization challenge for LHC-type beams.
      • Overview of the physics and parameters involved in optimization.
      • Optimization Approach:
      • Justification for choosing RL and specific RL architectures.
      • Step-by-step walkthrough:
        • Initial simulations and trials.
        • Challenges and lessons learned.
        • Final operational solution deployed in the control room.

      Exercise Session: Training RL Agents for RF Optimization (1 hour)

      • Objective:
      • Train RL agents to optimize RF double splitting settings in simulation for improved beam quality.
      • Implementation:
      • Use SWAN notebooks with provided skeleton code.
      • Define a custom gymnasium environment for the double splitting problem, given:
        • Pre-implemented simulation data loaders.
        • Basic loss function for optimization.
      Speaker: Joel Axel Wulff
    • 12:30
      Lunch and networking
    • 12
      Code You Won’t Regret (Too Much): How to Write Maintainable Code 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      If you have ever worked with an existing code base, you have likely realised how frustrating and time-consuming this can be. Why do some code bases allow you to easily make changes whereas other code bases make you want to pull your hair out and switch to a career in goose farming? In this lecture, we will explore some key principles that make code maintainable, adaptable, and easy to work with. We will start by defining what maintainable code really means, then explore the most important factors that contribute to it: readability, simplicity, single responsibility, abstractions, documentation, and testing.
      These concepts apply at any granularity, in any programming language, and with any programming paradigm. You will not only save yourself headaches but also make life less frustrating for anyone who has to work with your code (or at least make them less likely to be angry at you).

      Speaker: Niels Alexander Buegel
    • 13
      Exercise: Understanding Large Language Models and their Applications in Code Generation 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Lecture

      The field of Deep Learning has experienced a significant turning point in the past years, driven by the emergence and rapid development of Large Language Models (LLMs). These advanced models have not only redefined standards in Natural Language Processing (NLP) but are increasingly being integrated into applications and services due to their natural language capabilities.

      Interest in using LLMs for coding has grown rapidly, and some companies have sought to transform natural language processing into code generation. However, this trend has already exposed several unresolved challenges in applying LLMs to coding. Despite these difficulties, it has spurred the development of AI-powered code generation tools, such as GitHub Copilot.

      In this lecture, we will introduce Large Language Models and explore the related terminology and core components. We will examine the process of creating an LLM from scratch and highlight the requirements for developing domain-specific LLMs, such as those for code generation. Additionally, we will review the initial footprints of these technologies in code generation and discuss their limitations in this domain. We will also explore strategies and architectural approaches that improve the performance of LLMs generally and for code generation specifically.

      We will conclude by addressing the ethical concerns surrounding LLMs, such as the authorship and ownership of AI-generated code. Finally, we will explore other applications of LLMs in science, particularly within the High Energy Physics community and at CERN.

      Hands-on

      I think it could be interesting (but optional) to have a hands-on session, including:


      1. Some exercises to interact with an LLM from Python (either from a small pre-trained model in PyTorch or through API calls).
      2. Some exercises to explore domain-specific models for code generation.
      3. Hands-on exploration and testing of the LLMs components.
      4. Prompt Engineering strategies or small Fine-Tuning to improve the performance of a model.
      5. Some simplified strategies such as Retrieval Augmented Generation could also be implemented.
      Speaker: Andrea Valenzuela Ramirez (CERN)
    • 16:15
      Coffee 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
    • 14
      Exercise: LLMs in Production: RAG pipelines and beyond 513-1-024

      513-1-024

      Running Large Language Models (LLMs) in production presents lots of complexities extending far beyond your choice in model. Key challenges include:

      • How do you address knowledge staleness (i.e. your model being trained on out of date / not relevant information)?
      • How do you balance cost optimisation with model latency?
      • How do you reduce bias and factual hallucinations?

      A widely adopted approach to address these is Retrieval Augmented Generation (RAG).

      RAG pipelines implement a two-tiered approach (Retrieval & Generation): allowing models to be given domain-specific information prior to generating their response to a question. Through these techniques, "Off the Shelf" LLMs can be applied to a much wider domain context than what they were originally trained for.

      In this lecture, we will explore how to improve the adaptability of LLMs without the need for fine-tuning: covering RAG and related architectures, physics-based approaches like entropix that allow for self-reasoning / context aware sampling, and the challenges with applying these techniques in a production context.

      Speaker: Jack Charlie Munday (CERN)
    • 15
      Exercise: Under the Hood of the Snake: Behind the scenes of Python 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Python is one of the most used programming languages in scientific computing - yet when discussed, we often admonish: "Python is slow!"

      What does that mean in practice? Does it depend on the use case? Most importantly: what is Python doing under the hood?

      Let's put our fangs into the inner workings of Python. I will talk about its design and what "Python is slow" means, what we are comparing to, and when this matters. Through examples we will see these things in action - both where the language is limited, and where the limitation might be your code...

      Python is very flexible, which comes with trade-offs compared to languages like C++. However, "slow" is not an absolute term. There are good practices that can be applied to speed up our programs, saving both time and computing resources. With the right techniques, we can learn to charm the snake!

      Speaker: Sten Astrand (Lund University)
    • 16
      Maximum Likelihood Fitting 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Maximum likelihood fitting is central to many high-energy physics analyses, yet modern software makes it easy to use as a black box without understanding the underlying statistics.
      The statistics lectures in the main CSC and the tCSC on ML introduce the topic of likelihood, exploring the concept and showing its importance in data analysis. However these lectures do not have the time to dive into the more practical aspects of working with likelihoods, including performing maximum likelihood fits. Therefore I believe, a lecture on this topic, picking up where the CSC lectures left off and going more in depth on the fitting procedure makes for a natural continuation of the school.
      I propose a lecture, starting with a brief refresher on the topic of likelihoods, followed by an introduction to the concept of likelihood fitting and the underlying mathematics. Lastly I will get more specific on the topics of binned and profile likelihood fits. After the lecture the school participants are given a hands on exercise where they can perform a simple example of a maximum likelihood fit themselves. Through all of this I want to focus more on the underlying statistics and calculations, instead of relying on "out of the box" "plug and play" algorithms.
      This session will help participants build both conceptual and practical skills in maximum likelihood fitting.

      Speaker: Simon Thiele (University of Bonn (DE))
    • 17
      Illuminating the dark side of statistics: Bayesian inference in particle physics 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      High energy physics (HEP) has historically favoured frequentist statistical methodologies, leading to the development of analysis workflows and tools optimized around this paradigm. Frequentist techniques, such as those embedded in the HistFactory statistical model, have become standard in particle physics, often utilizing asymptotic approximations for efficient parameter estimation and hypothesis testing. While these methods are highly effective, their assumptions may restrict flexibility in certain analyses, especially when asymptotic approximations break down or simply do not apply.

      Bayesian approaches are starting to gain attention in HEP for their interpretative advantages, providing the full posterior distribution, which allows for flexible inference even when distributions deviate strongly from Gaussianity. Bayesian inference also handles cases with multiple parameters of interest more naturally and can incorporate prior information directly, without the need of auxiliary data. Additionally, Bayesian inference does not rely on asymptotic assumptions, making them well-suited for cases where such approximations may fail.

      Lecture plan

      More concretely, the lecture (1 hour) could be structured as follows:

      What makes an analysis "frequentist" or "Bayesian"?
      - The likelihood as a fundamental object
      - Priors vs. constraints

      Commonalities and differences in methodologies
      - Simplicity and speed of frequentist inference
      - Robustness and computational cost of Bayesian analyses
      - Non-Gaussian parameter distributions
      - Asymptotics and multiple parameters of interest
      - Bayesian updating
      - Confidence vs. credible intervals

      MCMC sampling for posterior estimation
      - Introduction to essential MCMC algorithms (e.g., Metropolis-Hastings, Hamiltonian Monte Carlo)
      - (Practical considerations in Bayesian computation and convergence diagnostics)

      Exercise plan

      The exercise session (1 hour) could include:

      Bayesian analysis of a mass parameter estimation

      1. Select a simple frequentist analysis (e.g., mass scan yielding a non-Gaussian probability density function).
      2. Construct the likelihood function for the mass parameter.
      3. Implement or use an existing MCMC algorithm to sample from the posterior distribution.
      4. Compute and visualize the posterior distribution of the mass parameter.
      5. Derive the Bayesian credible interval and compare it to the frequentist confidence interval, discussing interpretational differences and implications.
      Speaker: Lorenz Gartner (LMU)
    • 11:00
      Coffee and networking 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
    • 18
      Efficient Workflow Management in High-Energy Physics 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      In high-energy physics (HEP), efficient workflow management is crucial for processing large datasets, running simulations, and managing computational jobs across distributed environments. This lecture introduces Luigi, a workflow management tool originally developed at Spotify that helps automate and scale complex task pipelines, ensuring dependency resolution and fault tolerance.

      Building on Luigi, Law (Luigi analysis workflow) provides additional abstractions for HEP workflows by incorporating diverse batch job submission systems, like HTCondor, and different execution environments. Additionally, the automatic management of accessing distributed storage locations using the standard WLCG transfer protocols (e.g., WebDAV and XRootD) enables seamless execution of tasks across remote computing resources and ensures efficient resource utilization, simplifying the life of physicists a lot.

      The lecture will provide both conceptual insights and practical approaches for managing large-scale workflows in HEP, leveraging modern tools and distributed computing infrastructure to enhance scientific computing.
      Specifically, it will cover how Luigi and law can be used to construct robust and scalable workflows for HEP applications, from running jobs on distributed batch systems to automatically managing remote data access to conduct complex analyses.

      Speaker: Cedric Verstege (KIT - Karlsruhe Institute of Technology (DE))
    • 12:30
      Lunch and networking
    • 19
      Automate All the Things: CI/CD for the Bold and the Brave 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      This lecture provides a practical, in-depth look at modern CI/CD (Continuous Integration and Continuous Deployment) best practices within GitLab and GitHub environments. CI/CD is essential for efficient software delivery and quality assurance, particularly in scientific computing where reliable code performance and scalability are crucial. In this session, participants will explore fundamental and advanced strategies for implementing robust CI/CD pipelines, tailored for both small projects and large-scale systems.

      The lecture will cover:
      - Core CI/CD principles that enhance software quality, collaboration, and deployment.
      - Pipeline configurations within GitLab and GitHub, highlighting their similarities and unique features.
      - Automation tools and integrations that complement CI/CD workflows, including Docker, Kubernetes, and other popular tools that facilitate testing, code analysis, and deployment automation.
      - Security and best practices for managing CI/CD processes in complex project environments.

      The session will be followed by a one-hour hands-on exercise focused on designing and building a GitLab CI/CD pipeline. Participants will gain practical experience with pipeline setup, configuring stages, automating tests, and deploying workflows, ensuring they are equipped with the skills to apply CI/CD practices in their own projects.
      This lecture aims to demystify CI/CD for developers and data scientists in scientific computing, equipping them with actionable knowledge and skills to streamline code quality, deployment, and project integration.

      Speaker: Elizabeth Mamtsits
    • 20
      Exercise: Reinforcement Learning in Particle Accelerators: a practical example 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Proposal: Reinforcement Learning for Particle Accelerator control: A real-world example

      Hour 1: Introduction to Reinforcement Learning for Particle Accelerators

      • Basic Concepts:
      • Overview of Reinforcement Learning (RL) fundamentals.
      • Definitions and distinctions:
        • Model-free vs. model-based.
        • Off-policy vs. on-policy approaches.
      • Applications and Considerations:
      • Discussion of problem types and environmental variables affecting model selection in practical scenarios.
      • Analysis of drawbacks and benefits of different RL architectures.
      • Practical examples:
      • Real-world examples of RL in particle accelerators (e.g., CERN).
      • Case study introduction: Optimization of RF triple splittings in the Proton Synchrotron (PS).

      Hour 2: Optimizing RF Triple Splittings with Reinforcement Learning

      • Problem Definition:
      • Explanation of PS RF operations and the triple splitting optimization challenge for LHC-type beams.
      • Overview of the physics and parameters involved in optimization.
      • Optimization Approach:
      • Justification for choosing RL and specific RL architectures.
      • Step-by-step walkthrough:
        • Initial simulations and trials.
        • Challenges and lessons learned.
        • Final operational solution deployed in the control room.

      Exercise Session: Training RL Agents for RF Optimization (1 hour)

      • Objective:
      • Train RL agents to optimize RF double splitting settings in simulation for improved beam quality.
      • Implementation:
      • Use SWAN notebooks with provided skeleton code.
      • Define a custom gymnasium environment for the double splitting problem, given:
        • Pre-implemented simulation data loaders.
        • Basic loss function for optimization.
      Speaker: Joel Axel Wulff
    • 16:00
      Coffee and networking 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
    • 21
      Exercise: Maximum Likelihood Fitting 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Maximum likelihood fitting is central to many high-energy physics analyses, yet modern software makes it easy to use as a black box without understanding the underlying statistics.
      The statistics lectures in the main CSC and the tCSC on ML introduce the topic of likelihood, exploring the concept and showing its importance in data analysis. However these lectures do not have the time to dive into the more practical aspects of working with likelihoods, including performing maximum likelihood fits. Therefore I believe, a lecture on this topic, picking up where the CSC lectures left off and going more in depth on the fitting procedure makes for a natural continuation of the school.
      I propose a lecture, starting with a brief refresher on the topic of likelihoods, followed by an introduction to the concept of likelihood fitting and the underlying mathematics. Lastly I will get more specific on the topics of binned and profile likelihood fits. After the lecture the school participants are given a hands on exercise where they can perform a simple example of a maximum likelihood fit themselves. Through all of this I want to focus more on the underlying statistics and calculations, instead of relying on "out of the box" "plug and play" algorithms.
      This session will help participants build both conceptual and practical skills in maximum likelihood fitting.

      Speaker: Simon Thiele (University of Bonn (DE))
    • 22
      Exercise: Bayesian inference in particle physics 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      High energy physics (HEP) has historically favoured frequentist statistical methodologies, leading to the development of analysis workflows and tools optimized around this paradigm. Frequentist techniques, such as those embedded in the HistFactory statistical model, have become standard in particle physics, often utilizing asymptotic approximations for efficient parameter estimation and hypothesis testing. While these methods are highly effective, their assumptions may restrict flexibility in certain analyses, especially when asymptotic approximations break down or simply do not apply.

      Bayesian approaches are starting to gain attention in HEP for their interpretative advantages, providing the full posterior distribution, which allows for flexible inference even when distributions deviate strongly from Gaussianity. Bayesian inference also handles cases with multiple parameters of interest more naturally and can incorporate prior information directly, without the need of auxiliary data. Additionally, Bayesian inference does not rely on asymptotic assumptions, making them well-suited for cases where such approximations may fail.

      Lecture plan

      More concretely, the lecture (1 hour) could be structured as follows:

      What makes an analysis "frequentist" or "Bayesian"?
      - The likelihood as a fundamental object
      - Priors vs. constraints

      Commonalities and differences in methodologies
      - Simplicity and speed of frequentist inference
      - Robustness and computational cost of Bayesian analyses
      - Non-Gaussian parameter distributions
      - Asymptotics and multiple parameters of interest
      - Bayesian updating
      - Confidence vs. credible intervals

      MCMC sampling for posterior estimation
      - Introduction to essential MCMC algorithms (e.g., Metropolis-Hastings, Hamiltonian Monte Carlo)
      - (Practical considerations in Bayesian computation and convergence diagnostics)

      Exercise plan

      The exercise session (1 hour) could include:

      Bayesian analysis of a mass parameter estimation

      1. Select a simple frequentist analysis (e.g., mass scan yielding a non-Gaussian probability density function).
      2. Construct the likelihood function for the mass parameter.
      3. Implement or use an existing MCMC algorithm to sample from the posterior distribution.
      4. Compute and visualize the posterior distribution of the mass parameter.
      5. Derive the Bayesian credible interval and compare it to the frequentist confidence interval, discussing interpretational differences and implications.
      Speaker: Lorenz Gartner (LMU)
    • 23
      Breaking RSA and picking up the pieces 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      As quantum computers advance, they pose a significant threat to our current cryptographic infrastructure, particularly RSA encryption. This presentation will explore how RSA can be broken using Shor's algorithm and examine the landscape of post-quantum encryption algorithms.

      Presentation Overview

      Introduction to RSA and Its Importance in Modern Cryptography

      • Brief history of RSA
      • Current widespread use in secure online transactions and communications

      The Quantum Threat: Shor's Algorithm and Its Impact on RSA

      • Explanation of Shor's algorithm
      • How quantum computers can factor large numbers exponentially faster than classical computers
      • Implications for RSA security

      Post-Quantum Cryptography: An Overview

      • Introduction to post-quantum cryptographic algorithms
      • Types of post-quantum cryptography (lattice-based, code-based, multivariate polynomial, hash-based signatures)

      Standardization Efforts: NIST's Post-Quantum Cryptography Project

      • Overview of NIST's standardization process
      • Selected algorithms (CRYSTALS-Dilithium, FALCON, SPHINCS+, CRYSTALS-Kyber)
      • Challenges in standardization and implementation

      Implementation Considerations for Post-Quantum Algorithms

      • Integration into existing cryptographic libraries
      • Performance comparisons with classical algorithms
      • Security analysis and known vulnerabilities

      Key Takeaways

      1. Understanding of the quantum threat to RSA and current public-key cryptography
      2. Knowledge of post-quantum cryptographic algorithms and their types
      3. Insights into implementation challenges and migration strategies
      4. Insights into preparing for the post-quantum era in cybersecurity
      Speaker: Vasvi Sharma
    • 24
      On equivariance and explainability 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Permutation equivariant and Lorentz invariant neural networks have garnered the attention of the ML community at large for a while now, finding particular success in cases where symmetries in the data space can be exploited to overcome issues of low statistics and constraints on training time or model size. In high energy physics, however, neither problem is common to us: due to the inherent probabilistic nature of our experiments, we can simulate datasets that would excite even the most ingrained OpenAI engineer, and our computational power is on a vastly different scale compared to the average ML enthusiast.

      The main strength of equivariant networks in our field lies in a different aspect, in fact, one of the main aspects that had the HEP community hesitant to adopt ML solutions in the first place, namely that of explainability. Compared to other approaches that use non-specialized architectures with many parameters and high flexibility but don’t take into account underlying physics principles, equivariant networks provide reduced complexity and increased interpretability—two key factors when searching for new physics phenomena in underexplored parameter spaces.

      This talk will introduce the audience to the concepts and benefits of equivariant methods in ML. After a brief motivation for this subfield, accompanied by a short maths lesson, the audience will be introduced to the core concepts of equivariant networks, followed by examples of architectures and applications. The hour will conclude with a discussion of the newest developments in this and related fields, perhaps with a brief look at applications outside of physics.

      Speaker: Kaare Endrup Iversen (Lund University (SE))
    • 11:00
      Coffee and networking 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
    • 25
      Data Processing with FPGAs: Parallel Computing on Compact Configurable Logic 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      In particle physics, Field-Programmable Gate Arrays (FPGAs) play an increasing role in acquiring, processing, and filtering data at experiments with high data rates. Today their advanced technology allows for complex processing, rather than just being used as glue logic on the hardware level. The high degree of parallelism, flexibility, and number of IOs in combination with low power consumption and size make them a valid choice for many applications in science and industry.

      A drawback is the high complexity of designing efficient and optimized firmware using Hardware Description Languages (HDL). High-Level Synthesis (HLS) approaches, however, gain more and more popularity. They allow for comparatively simple and effective implementation of complex algorithms, for example in real-time image processing or deep learning.

      In this lecture, I will discuss how FPGAs operate, their building blocks, properties, advantages, challenges, and general feasibility in an experimental context. I will go through the general process of designing logic on the RTL level with a beginner-friendly code example to provide a more tangible treatment. Recent developments, comparisons to other widely used computing architectures, and exemplary use cases will conclude with a short view of HLS as an alternative approach suitable for software engineers.

      Speaker: Peter Hinderberger (Technical University of Munich)
    • 12:30
      Lunch and networking
    • 26
      Web for the win! A crash course in building web apps 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map

      Abstract/Description:
      "Welcome to web dev 101! In this lecture we will go over the fundamentals (HTML, CSS, JS), tooling, overview of common UI libraries and frameworks, teach you the fundamentals of React development and testing, tldr user experience design, APIs, highlight modern features such as web assembly and WebGL.
      Development of graphical user interfaces (GUIs) are often serious and timely undertaking, throughout all the different phases from the requirement gathering, design, user experience to the complexities of cross platform development with native GUI libraries. Web apps to the rescue!
      Building web apps can help speed up this process and provide a platform agnostic interface for your software, which can run locally or remotely. Utilising design systems can take the pain of user interface design away allowing you to build beautiful and accessibility friendly UI components quickly. API backends can enhance the web app with capabilities to use external databases, software and programs."

      Speaker: George Coldstream
    • 27
      Closing remarks 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
      Speaker: Alberto Pace (CERN)
    • 15:15
      Coffee and networking 31/3-004 - IT Amphitheatre

      31/3-004 - IT Amphitheatre

      CERN

      105
      Show room on map
    • 28
      Exercise: Automate All the Things: CI/CD for the Bold and the Brave 513-1-024

      513-1-024

      CERN

      This lecture provides a practical, in-depth look at modern CI/CD (Continuous Integration and Continuous Deployment) best practices within GitLab and GitHub environments. CI/CD is essential for efficient software delivery and quality assurance, particularly in scientific computing where reliable code performance and scalability are crucial. In this session, participants will explore fundamental and advanced strategies for implementing robust CI/CD pipelines, tailored for both small projects and large-scale systems.

      The lecture will cover:
      - Core CI/CD principles that enhance software quality, collaboration, and deployment.
      - Pipeline configurations within GitLab and GitHub, highlighting their similarities and unique features.
      - Automation tools and integrations that complement CI/CD workflows, including Docker, Kubernetes, and other popular tools that facilitate testing, code analysis, and deployment automation.
      - Security and best practices for managing CI/CD processes in complex project environments.

      The session will be followed by a one-hour hands-on exercise focused on designing and building a GitLab CI/CD pipeline. Participants will gain practical experience with pipeline setup, configuring stages, automating tests, and deploying workflows, ensuring they are equipped with the skills to apply CI/CD practices in their own projects.
      This lecture aims to demystify CI/CD for developers and data scientists in scientific computing, equipping them with actionable knowledge and skills to streamline code quality, deployment, and project integration.

      Speaker: Elizabeth Mamtsits
    • 29
      Exercise: Efficient Workflow Management in High-Energy Physics 31-S-023

      31-S-023

      In high-energy physics (HEP), efficient workflow management is crucial for processing large datasets, running simulations, and managing computational jobs across distributed environments. This lecture introduces Luigi, a workflow management tool originally developed at Spotify that helps automate and scale complex task pipelines, ensuring dependency resolution and fault tolerance.

      Building on Luigi, Law (Luigi analysis workflow) provides additional abstractions for HEP workflows by incorporating diverse batch job submission systems, like HTCondor, and different execution environments. Additionally, the automatic management of accessing distributed storage locations using the standard WLCG transfer protocols (e.g., WebDAV and XRootD) enables seamless execution of tasks across remote computing resources and ensures efficient resource utilization, simplifying the life of physicists a lot.

      The lecture will provide both conceptual insights and practical approaches for managing large-scale workflows in HEP, leveraging modern tools and distributed computing infrastructure to enhance scientific computing.
      Specifically, it will cover how Luigi and law can be used to construct robust and scalable workflows for HEP applications, from running jobs on distributed batch systems to automatically managing remote data access to conduct complex analyses.

      Speaker: Cedric Verstege (KIT - Karlsruhe Institute of Technology (DE))
    • 30
      Exercise: Web for the win! A crash course in building web apps 513-1-024

      513-1-024

      Abstract/Description:
      "Welcome to web dev 101! In this lecture we will go over the fundamentals (HTML, CSS, JS), tooling, overview of common UI libraries and frameworks, teach you the fundamentals of React development and testing, tldr user experience design, APIs, highlight modern features such as web assembly and WebGL.
      Development of graphical user interfaces (GUIs) are often serious and timely undertaking, throughout all the different phases from the requirement gathering, design, user experience to the complexities of cross platform development with native GUI libraries. Web apps to the rescue!
      Building web apps can help speed up this process and provide a platform agnostic interface for your software, which can run locally or remotely. Utilising design systems can take the pain of user interface design away allowing you to build beautiful and accessibility friendly UI components quickly. API backends can enhance the web app with capabilities to use external databases, software and programs."

      Speaker: George Coldstream