8โ€“12 Sept 2025
Hamburg, Germany
Europe/Berlin timezone

Zero-overhead ML training with ROOT in an ATLAS Open Data analysis

Not scheduled
30m
Hamburg, Germany

Hamburg, Germany

Poster Track 1: Computing Technology for Physics Research Poster session with coffee break

Speaker

Martin Foll (University of Oslo (NO))

Description

The ROOT software framework is widely used in HEP for storage, processing, analysis and visualization of large datasets. With the large increase in usage of ML for experiment workflows, especially lately in the last steps of the analysis pipeline, the matter of exposing ROOT data ergonomically to ML models becomes ever more pressing. In this contribution we discuss the experimental component of ROOT that exposes ROOT datasets in batches ready for the training phase. A new shuffling strategy for creating the batches to prevent biased training is discussed, taking as examples real-life use cases relative to ATLAS Open Data.
An end-to-end ML physics analysis is carried out to show how training a model with common ML tools can be done directly from ROOT datasets to avoid intermediate data conversions, streamline workflows and used in the case where the training data does not fit in memory. Datasets from ATLAS Open Data are used as input to analyses searching for the Higgs boson or new BSM particles such as supersymmetric particles. The datasets are stored in the new on-disk ROOT format called RNTuple.

References

CHEP 2024: Zero-overhead training of machine learning models with ROOT data
https://indico.cern.ch/event/1338689/contributions/6015940/

Significance

This presentation covers a new shuffling strategy in the experimental component of ROOT that exposes ROOT datasets in batches ready for the training phase to prevent biased training when used with common ML tools. This enabled ML training to be done directly from ROOT datasets avoiding the need for intermediate data conversions, streamlining workflows and used in the case where the training data does not fit in memory. In this contribution an end-to-end physics analysis is carried out to show how it can be used when training a model with common ML tools with ATLAS Open Data as input datasets stored in the new on-disk ROOT format called RNTuple.

Experiment context, if any ATLAS

Authors

Martin Foll (University of Oslo (NO)) Dr Vincenzo Eduardo Padulano (CERN) Danilo Piparo (CERN) Prof. Farid Ould-Saada (University of Oslo (NO)) Dr Eirik Gramstad (University of Oslo (NO)) James Catmore (University of Oslo (NO))

Presentation materials

There are no materials yet.