AI RCS Strategy Workshop

Name: AI RCS Strategy Workshop
Start: 2025-09-15T08:09:00+02:00
End: 2025-09-19T18:00:00+02:00
Location: CERN

15–19 Sept 2025

CERN

Europe/Zurich timezone

Zero-conversion reading of HEP data for training with common ML tools

16 Sept 2025, 11:15

40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

100

Show room on map

4. AI Infrastructure for Model Training AI Infrastructure for Model Training

Dr Vincenzo Eduardo Padulano (CERN)

Training ML models on High Energy Physics data currently requires either very expensive copies and conversion to some intermediate format or creation of custom I/O pipelines for the end user. ROOT provides a prototype system for ingestion of data in the common TTree format (which also supports the future RNTuple format) directly into the ML model. This requires zero conversion steps and is done via a single function call for the final user. This streamlined approach of ingesting data into ML models can be made generic and cross-experiment. Work is required towards bringing this prototype in production, testing it on distributed scenarios and with training involving GPUs.

CERN group/ Experiment

EP-SFT

Working area	Area 4: AI Infrastructure for Model Training
Project goals	Problem: Common ML tools do not support natively data loading of HEP data formats Intermediate goal: Benchmark the native ROOT data loading into batches for ML training across multiple ML models, datasets, computing platforms Final Goal: Develop an easy-to-use API that seamlessly provides native data loading of ROOT datasets to ML models, in an efficient and scalable way, thus removing the need for intermediate data conversions and unnecessary bookkeeping.
Timeline	Year 1: Research typical physics use cases that employ ML training workflows. Make use of this knowledge to benchmark and profile the existing prototype according to realistic scenarios. Provide continuous reports and take stock by defining the most important optimizations and missing features required Year 2: Act on knowledge accumulated in Y1, extend and bring data loading tool to production-grade level Year 3: Demonstrate possible integration of the tool in experiment frameworks and analyses that currently require expensive data duplication and bookkeeping.
Available person power	0
Additional person power request	1 Graduate + 0.2 Staff for supervision
Is this an already ongoing activity?	Yes
Indicative hardware resources needs	1 PC equipped with a GPU

Dr Vincenzo Eduardo Padulano (CERN)

There are no materials yet.

AI RCS Strategy Workshop

Zero-conversion reading of HEP data for training with common ML tools

40/S2-A01 - Salle Anderson

CERN

Speaker

Description

CERN group/ Experiment

Author

Presentation materials