AI RCS Strategy Workshop

Name: AI RCS Strategy Workshop
Start: 2025-09-15T08:09:00+02:00
End: 2025-09-19T18:00:00+02:00
Location: CERN

15–19 Sept 2025

CERN

Europe/Zurich timezone

Towards a common end-to-end flash simulation

15 Sept 2025, 12:30

500/1-001 - Main Auditorium (CERN)

500/1-001 - Main Auditorium

CERN

400

Show room on map

1. Cutting Edge AI for Offline Data Processing Cutting Edge AI for Offline Data Processing

Michał Mazurek (National Centre for Nuclear Research (PL))

The CERN-SFT group, in their summary paper, proposed that "a common end-to-end fast-simulation tool could be created across experiments to complement the GEANT library." Building on the experience gained by LHCb in developing its Flash Simulation framework, Lamarr, several key challenges have emerged in integrating machine learning (ML) algorithms into high-energy physics software stacks:

ML models are typically lightweight, but the event-level granularity of the Gaudi scheduler complicates batching particles across multiple events. This results in frequent model invocations and significant overhead when using dedicated runtimes.
Dedicated runtimes are optimized for multithreading, which may conflict with Gaudi’s own multithreading management.
Constructing ML pipelines—comprising preprocessing, inference, and postprocessing—requires C++ development, a skillset often distinct from that of ML engineers who typically work in Python.

To address these challenges, Lamarr adopted a pipeline description language based on XML. This enables the composition of in-process computing blocks, distributed as shared objects via CVMFS. These blocks are transpiled from Python to C using tools such as scikinC and keras2c. This strategy shares conceptual similarities with SOFIE, a framework developed by CERN-SFT and used by LHCb.

We propose a collaborative project to gather requirements and draft an implementation plan for a multi-experiment, multi-application ML deployment system. This system would target high-throughput computing (HTC) environments and multithreaded C++ applications.

Key considerations include:

Intermediate Data Representation: Efficient in-memory formats for intermediate data between computing blocks that support batch processing and cross-language accessibility (e.g., C++ and Python). Apache Arrow Tables and ROOT RDataFrames serve as promising examples.
Experiment Independence: Leveraging Lamarr’s architecture as a foundation for a generalized, experiment-agnostic framework.
Graph-Based Data Structures: Enabling the definition and execution of ML pipelines on heterogeneous graph data representing particles, vertices, and reconstructed physics objects.

We believe that Lamarr’s implementation offers a valuable starting point and could serve as a prototype for a broader, experiment-independent solution.

CERN group/ Experiment

LHCb, EP-SFT

Working area	Area 1" Cutting Edge AI for Offline Data Processing
Project goals	Participation and using Lamarr as one of the examples / backbones for the common end-to-end flash simulation.
Timeline	3 years
Available person power	0.1 FTE
Additional person power request	1 FTE
Is this an already ongoing activity?	Yes

Lucio Anderlini (Universita e INFN, Firenze (IT)) Michał Mazurek (National Centre for Nuclear Research (PL)) Gloria Corti (CERN)

There are no materials yet.

AI RCS Strategy Workshop

Towards a common end-to-end flash simulation

500/1-001 - Main Auditorium

CERN

Speaker

Description

CERN group/ Experiment

Authors

Presentation materials

Choose timezone

AI RCS Strategy Workshop

Speaker

Description

CERN group/ Experiment

Authors

Presentation materials