Oct 27 – 30, 2025
CERN
Europe/Zurich timezone

Zero-overhead ML training from Python with ROOT in an ATLAS Open Data analysis

Oct 28, 2025, 2:50 PM
10m
222/R-001 (CERN)

222/R-001

CERN

200
Show room on map

Speaker

Martin Foll (University of Oslo (NO))

Description

The ROOT software framework is widely used from Python in HEP for storage, processing, analysis and visualization of large datasets. With the large increase in usage of ML from the Python ecosystem for experiment workflows, especially lately in the last steps of the analysis pipeline, the matter of exposing ROOT data ergonomically to ML models becomes ever more pressing. In this contribution we discuss the experimental component of ROOT that exposes ROOT datasets in batches ready for the training phase. A new shuffling strategy for creating the batches to prevent biased training is discussed, taking as examples real-life use cases relative to ATLAS Open Data.
An end-to-end ML physics analysis using ATLAS Open Data is carried out to show how training a model with common ML tools can be done directly from ROOT datasets to avoid intermediate data conversions, streamline workflows and used in the case where the training data does not fit in memory.

Authors

Danilo Piparo (CERN) Dr Eirik Gramstad (University of Oslo (NO)) Prof. Farid Ould-Saada (University of Oslo (NO)) James Catmore (University of Oslo (NO)) Martin Foll (University of Oslo (NO)) Dr Vincenzo Eduardo Padulano (CERN)

Presentation materials