Speaker
Dr
Vincenzo Eduardo Padulano
(CERN)
Description
Training ML models on High Energy Physics data currently requires either very expensive copies and conversion to some intermediate format or creation of custom I/O pipelines for the end user. ROOT provides a prototype system for ingestion of data in the common TTree format (which also supports the future RNTuple format) directly into the ML model. This requires zero conversion steps and is done via a single function call for the final user. This streamlined approach of ingesting data into ML models can be made generic and cross-experiment. Work is required towards bringing this prototype in production, testing it on distributed scenarios and with training involving GPUs.
CERN group/ Experiment
EP-SFT
| Working area | Area 4: AI Infrastructure for Model Training |
|---|---|
| Project goals | Problem: Common ML tools do not support natively data loading of HEP data formats Intermediate goal: Benchmark the native ROOT data loading into batches for ML training across multiple ML models, datasets, computing platforms Final Goal: Develop an easy-to-use API that seamlessly provides native data loading of ROOT datasets to ML models, in an efficient and scalable way, thus removing the need for intermediate data conversions and unnecessary bookkeeping. |
| Timeline | Year 1: Research typical physics use cases that employ ML training workflows. Make use of this knowledge to benchmark and profile the existing prototype according to realistic scenarios. Provide continuous reports and take stock by defining the most important optimizations and missing features required Year 2: Act on knowledge accumulated in Y1, extend and bring data loading tool to production-grade level Year 3: Demonstrate possible integration of the tool in experiment frameworks and analyses that currently require expensive data duplication and bookkeeping. |
| Available person power | 0 |
| Additional person power request | 1 Graduate + 0.2 Staff for supervision |
| Is this an already ongoing activity? | Yes |
| Indicative hardware resources needs | 1 PC equipped with a GPU |
Author
Dr
Vincenzo Eduardo Padulano
(CERN)