AI RCS Strategy Workshop

Name: AI RCS Strategy Workshop
Start: 2025-09-15T08:09:00+02:00
End: 2025-09-19T18:00:00+02:00
Location: CERN

15–19 Sept 2025

CERN

Europe/Zurich timezone

Session

AI Infrastructure for Model Training

16 Sept 2025, 10:30

40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

100

Show room on map

There are no materials yet.

45. Energy-Aware Training Strategies for Sustainable Generative Modeling in Detector Simulation

Dr Sofia Vallecorsa (CERN)

16/09/2025, 10:55

4. AI Infrastructure for Model Training

Training large-scale generative models for particle detector simulation is computationally demanding, contributing significantly to energy consumption. This project focuses on developing energy-efficient training strategies for generative models used in detector simulation. By integrating energy-aware optimization strategies, mixed-precision training and sustainability metrics, the project...
Go to contribution page
97. A modular ML training framework for state-of-the-art HEP tools and analysis

Sebastian Wuchterl (CERN)

16/09/2025, 11:00

4. AI Infrastructure for Model Training

Development of a cutting-edge Deep Learning framework for HEP objects and analysis tasks, automatising the tasks with optimized data structure, CPU overhead, and GPU usage. The functionalities include model and feature modularity, benchmarking, hyperparameter optimization, distributed running, optimized data structures, data loading, and inference optimization. One option as a baseline...
Go to contribution page
71. MLOps Infrastructure and End-to-End Workflows for Online LHCb Operations

Apostolos Karvelas (CERN)

16/09/2025, 11:05

4. AI Infrastructure for Model Training

This project focuses on establishing a dedicated MLOps environment tailored to the needs of the online operations of the LHCb experiment. Its goal is to enable the development, optimization, and deployment of machine learning models entirely within the LHCb technical network, using LHCb-managed resources and directly supporting online workflows.

The first phase of the project, focused on...
Go to contribution page
35. Distributed data-loading Pipelines with ROOT for large-scale ML Training

Stephan Hageboeck (CERN)

16/09/2025, 11:10

4. AI Infrastructure for Model Training

In the HL-LHC era, ever larger datasets for ML training are in sight. These will enable the training of increasingly complex models, but the sheer volume of data may exhaust the capabilities of the machines that run the training. The data might neither fit in RAM, nor might saving the data on fast storage be cost-effective.
In this project, ROOT and the existing CERN infrastructure such as...
Go to contribution page
128. Zero-conversion reading of HEP data for training with common ML tools

Dr Vincenzo Eduardo Padulano (CERN)

16/09/2025, 11:15

4. AI Infrastructure for Model Training

Training ML models on High Energy Physics data currently requires either very expensive copies and conversion to some intermediate format or creation of custom I/O pipelines for the end user. ROOT provides a prototype system for ingestion of data in the common TTree format (which also supports the future RNTuple format) directly into the ML model. This requires zero conversion steps and is...
Go to contribution page
67. Provisioning AI/ML tools for data scientists

Andre Sailer (CERN)

16/09/2025, 11:20

4. AI Infrastructure for Model Training

AI/ML tools evolve quickly, new versions and new packages are constantly being created. Providing new and updated packages in a consistent manner and for a distributed environment takes dedicated effort to avoid scalability issues. The LCG software stacks provide a wide range of AI/ML and related packages via CVMFS such as tensorflow, torch, jax, CUDA, and ROOT. As part of the RCS/AI...
Go to contribution page
130. A Filesystem View on AI Training Data in Object- and Cloud Storage

Valentin Volkl (CERN)

16/09/2025, 11:25

4. AI Infrastructure for Model Training

Modern AI training for complex neural networks demands low-latency access to multi-petabyte datasets, versioned software stacks, and reproducible environments, mirroring challenges traditionally addressed by CVMFS in scientific domains. While at its core a software distribution tool, CVMFS can provide a general filesystem view on external data in object stores. This data-distribution over...
Go to contribution page
104. Scaling out AI/ML workloads to external resources

Raulian-Ionut Chiorescu, Ricardo Rocha (CERN)

16/09/2025, 11:30

4. AI Infrastructure for Model Training

As AI/ML usage and use cases grow at CERN, training at scale as well as testing, benchmarking and validation on newer generation devices requires access to resources not currently available on-premises.

This activity involves setting up the required integrations in the CERN MLOps infrastructure to accommodate these requirements as seamlessly as possible. The work considers integration with...
Go to contribution page
143. Data Preparation for Machine Learning Event Reconstruction

Lena Maria Herrmann

16/09/2025, 11:35

4. AI Infrastructure for Model Training

Event reconstruction is key to unlocking the full physics potential of the Future Circular Collider (FCC). Particle Flow (PF) techniques, which combine information from different subdetectors, rely on precise and well-understood inputs. Classical approaches often use hand-crafted features and detector-specific preprocessing, but machine learning (ML) methods require a different level of...
Go to contribution page
99. Network infrastructure to support on-premises AI/ML workloads

David Gutierrez Rueda (CERN), Eric Grancher (CERN)

16/09/2025, 11:40

4. AI Infrastructure for Model Training

While the infrastructure supporting AI/ML can be in the cloud, or use the existing HPC resources; this proposal considers the need to support on-premises AI/ML workloads with stringent requirements of performance, bandwidth, latency and lossless communication over Ethernet.

If CERN/RCS strategy for IA includes the support of high-performance resources in CERN's Datacentres for AI/ML...
Go to contribution page
23. itwinai: Scalable AI Training and Optimization on HPC for Science

Matteo Bunino (CERN), Dr Maria Girone (CERN)

16/09/2025, 11:45

4. AI Infrastructure for Model Training

This proposal focuses on the further development and adoption of the itwinai framework, designed to help scientists scale their AI workloads on HPC and cloud systems while minimizing engineering overhead. itwinai provides high-level, reproducible workflows for distributed machine learning training and hyperparameter optimization using tools such as PyTorch DDP, DeepSpeed, Horovod, and Ray...
Go to contribution page

Building timetable...

Choose timezone

AI RCS Strategy Workshop

Presentation materials