15–19 Sept 2025
CERN
Europe/Zurich timezone

Session

AI Infrastructure for Model Training

16 Sept 2025, 10:30
40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

100
Show room on map

Presentation materials

There are no materials yet.

  1. Dr Sofia Vallecorsa (CERN)
    16/09/2025, 10:55
    4. AI Infrastructure for Model Training

    Training large-scale generative models for particle detector simulation is computationally demanding, contributing significantly to energy consumption. This project focuses on developing energy-efficient training strategies for generative models used in detector simulation. By integrating energy-aware optimization strategies, mixed-precision training and sustainability metrics, the project...

    Go to contribution page
  2. Sebastian Wuchterl (CERN)
    16/09/2025, 11:00
    4. AI Infrastructure for Model Training

    Development of a cutting-edge Deep Learning framework for HEP objects and analysis tasks, automatising the tasks with optimized data structure, CPU overhead, and GPU usage. The functionalities include model and feature modularity, benchmarking, hyperparameter optimization, distributed running, optimized data structures, data loading, and inference optimization. One option as a baseline...

    Go to contribution page
  3. Apostolos Karvelas (CERN)
    16/09/2025, 11:05
    4. AI Infrastructure for Model Training

    This project focuses on establishing a dedicated MLOps environment tailored to the needs of the online operations of the LHCb experiment. Its goal is to enable the development, optimization, and deployment of machine learning models entirely within the LHCb technical network, using LHCb-managed resources and directly supporting online workflows.

    The first phase of the project, focused on...

    Go to contribution page
  4. Stephan Hageboeck (CERN)
    16/09/2025, 11:10
    4. AI Infrastructure for Model Training

    In the HL-LHC era, ever larger datasets for ML training are in sight. These will enable the training of increasingly complex models, but the sheer volume of data may exhaust the capabilities of the machines that run the training. The data might neither fit in RAM, nor might saving the data on fast storage be cost-effective.
    In this project, ROOT and the existing CERN infrastructure such as...

    Go to contribution page
  5. Dr Vincenzo Eduardo Padulano (CERN)
    16/09/2025, 11:15
    4. AI Infrastructure for Model Training

    Training ML models on High Energy Physics data currently requires either very expensive copies and conversion to some intermediate format or creation of custom I/O pipelines for the end user. ROOT provides a prototype system for ingestion of data in the common TTree format (which also supports the future RNTuple format) directly into the ML model. This requires zero conversion steps and is...

    Go to contribution page
  6. Andre Sailer (CERN)
    16/09/2025, 11:20
    4. AI Infrastructure for Model Training

    AI/ML tools evolve quickly, new versions and new packages are constantly being created. Providing new and updated packages in a consistent manner and for a distributed environment takes dedicated effort to avoid scalability issues. The LCG software stacks provide a wide range of AI/ML and related packages via CVMFS such as tensorflow, torch, jax, CUDA, and ROOT. As part of the RCS/AI...

    Go to contribution page
  7. Valentin Volkl (CERN)
    16/09/2025, 11:25
    4. AI Infrastructure for Model Training

    Modern AI training for complex neural networks demands low-latency access to multi-petabyte datasets, versioned software stacks, and reproducible environments, mirroring challenges traditionally addressed by CVMFS in scientific domains. While at its core a software distribution tool, CVMFS can provide a general filesystem view on external data in object stores. This data-distribution over...

    Go to contribution page
  8. Raulian-Ionut Chiorescu, Ricardo Rocha (CERN)
    16/09/2025, 11:30
    4. AI Infrastructure for Model Training

    As AI/ML usage and use cases grow at CERN, training at scale as well as testing, benchmarking and validation on newer generation devices requires access to resources not currently available on-premises.

    This activity involves setting up the required integrations in the CERN MLOps infrastructure to accommodate these requirements as seamlessly as possible. The work considers integration with...

    Go to contribution page
  9. Lena Maria Herrmann
    16/09/2025, 11:35
    4. AI Infrastructure for Model Training

    Event reconstruction is key to unlocking the full physics potential of the Future Circular Collider (FCC). Particle Flow (PF) techniques, which combine information from different subdetectors, rely on precise and well-understood inputs. Classical approaches often use hand-crafted features and detector-specific preprocessing, but machine learning (ML) methods require a different level of...

    Go to contribution page
  10. David Gutierrez Rueda (CERN), Eric Grancher (CERN)
    16/09/2025, 11:40
    4. AI Infrastructure for Model Training

    While the infrastructure supporting AI/ML can be in the cloud, or use the existing HPC resources; this proposal considers the need to support on-premises AI/ML workloads with stringent requirements of performance, bandwidth, latency and lossless communication over Ethernet.

    If CERN/RCS strategy for IA includes the support of high-performance resources in CERN's Datacentres for AI/ML...

    Go to contribution page
  11. Matteo Bunino (CERN), Dr Maria Girone (CERN)
    16/09/2025, 11:45
    4. AI Infrastructure for Model Training

    This proposal focuses on the further development and adoption of the itwinai framework, designed to help scientists scale their AI workloads on HPC and cloud systems while minimizing engineering overhead. itwinai provides high-level, reproducible workflows for distributed machine learning training and hyperparameter optimization using tools such as PyTorch DDP, DeepSpeed, Horovod, and Ray...

    Go to contribution page
Building timetable...