15–19 Sept 2025
CERN
Europe/Zurich timezone

Distributed data-loading Pipelines with ROOT for large-scale ML Training

16 Sept 2025, 11:10
5m
40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

100
Show room on map
4. AI Infrastructure for Model Training AI Infrastructure for Model Training

Speaker

Stephan Hageboeck (CERN)

Description

In the HL-LHC era, ever larger datasets for ML training are in sight. These will enable the training of increasingly complex models, but the sheer volume of data may exhaust the capabilities of the machines that run the training. The data might neither fit in RAM, nor might saving the data on fast storage be cost-effective.
In this project, ROOT and the existing CERN infrastructure such as EOS, SWAN or Openstack will be combined to form a data-loading cluster, allowing to scale the loading and filtering of training data across a large pool of CERN resources. The data could be prepared asynchronously on a multitude of hosts, partitioned into batches ready to be consumed by various ML frameworks, and be streamed via network as the training progresses.

CERN group/ Experiment

EP-SFT

Working area Area 4: AI Infrastructure for Model Training
Project goals Prepare ROOT for a time when datasets for training don’t fit on a single machine, leverage existing CERN resources to facilitate large-scale training
Timeline One year
Available person power 0.1 Staff
Additional person power request 0.5 FTE Fellow/Grad
Is this an already ongoing activity? No
Indicative hardware resources needs -

Author

Presentation materials

There are no materials yet.