AI RCS Strategy Workshop

Name: AI RCS Strategy Workshop
Start: 2025-09-15T08:09:00+02:00
End: 2025-09-19T18:00:00+02:00
Location: CERN

15–19 Sept 2025

CERN

Europe/Zurich timezone

Distributed data-loading Pipelines with ROOT for large-scale ML Training

16 Sept 2025, 11:10

40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

100

Show room on map

4. AI Infrastructure for Model Training AI Infrastructure for Model Training

Stephan Hageboeck (CERN)

In the HL-LHC era, ever larger datasets for ML training are in sight. These will enable the training of increasingly complex models, but the sheer volume of data may exhaust the capabilities of the machines that run the training. The data might neither fit in RAM, nor might saving the data on fast storage be cost-effective.
In this project, ROOT and the existing CERN infrastructure such as EOS, SWAN or Openstack will be combined to form a data-loading cluster, allowing to scale the loading and filtering of training data across a large pool of CERN resources. The data could be prepared asynchronously on a multitude of hosts, partitioned into batches ready to be consumed by various ML frameworks, and be streamed via network as the training progresses.

CERN group/ Experiment

EP-SFT

Working area	Area 4: AI Infrastructure for Model Training
Project goals	Prepare ROOT for a time when datasets for training don’t fit on a single machine, leverage existing CERN resources to facilitate large-scale training
Timeline	One year
Available person power	0.1 Staff
Additional person power request	0.5 FTE Fellow/Grad
Is this an already ongoing activity?	No
Indicative hardware resources needs	-

Stephan Hageboeck (CERN)

There are no materials yet.

AI RCS Strategy Workshop

Distributed data-loading Pipelines with ROOT for large-scale ML Training

40/S2-A01 - Salle Anderson

CERN

Speaker

Description

CERN group/ Experiment

Author

Presentation materials