Speaker
Description
In the HL-LHC era, ever larger datasets for ML training are in sight. These will enable the training of increasingly complex models, but the sheer volume of data may exhaust the capabilities of the machines that run the training. The data might neither fit in RAM, nor might saving the data on fast storage be cost-effective.
In this project, ROOT and the existing CERN infrastructure such as EOS, SWAN or Openstack will be combined to form a data-loading cluster, allowing to scale the loading and filtering of training data across a large pool of CERN resources. The data could be prepared asynchronously on a multitude of hosts, partitioned into batches ready to be consumed by various ML frameworks, and be streamed via network as the training progresses.
CERN group/ Experiment
EP-SFT
| Working area | Area 4: AI Infrastructure for Model Training |
|---|---|
| Project goals | Prepare ROOT for a time when datasets for training don’t fit on a single machine, leverage existing CERN resources to facilitate large-scale training |
| Timeline | One year |
| Available person power | 0.1 Staff |
| Additional person power request | 0.5 FTE Fellow/Grad |
| Is this an already ongoing activity? | No |
| Indicative hardware resources needs | - |