ACAT 2025

Name: ACAT 2025
Start: 2025-09-08T08:00:00+02:00
End: 2025-09-12T16:30:00+02:00
Location: Hamburg, Germany

8–12 Sept 2025

Hamburg, Germany

Europe/Berlin timezone

Zero-overhead ML training with ROOT in an ATLAS Open Data analysis

10 Sept 2025, 11:00

30m

ESA W 'West Wing'

Poster Track 1: Computing Technology for Physics Research Poster session with coffee break

Martin Foll (University of Oslo (NO))

The ROOT software framework is widely used in HEP for storage, processing, analysis and visualization of large datasets. With the large increase in usage of ML for experiment workflows, especially lately in the last steps of the analysis pipeline, the matter of exposing ROOT data ergonomically to ML models becomes ever more pressing. In this contribution we discuss the experimental component of ROOT that exposes ROOT datasets in batches ready for the training phase. A new shuffling strategy for creating the batches to prevent biased training is discussed, taking as examples real-life use cases relative to ATLAS Open Data.
An end-to-end ML physics analysis is carried out to show how training a model with common ML tools can be done directly from ROOT datasets to avoid intermediate data conversions, streamline workflows and used in the case where the training data does not fit in memory. Datasets from ATLAS Open Data are used as input to analyses searching for the Higgs boson or new BSM particles such as supersymmetric particles. The datasets are stored in the new on-disk ROOT format called RNTuple.

References

CHEP 2024: Zero-overhead training of machine learning models with ROOT data
https://indico.cern.ch/event/1338689/contributions/6015940/

Significance

This presentation covers a new shuffling strategy in the experimental component of ROOT that exposes ROOT datasets in batches ready for the training phase to prevent biased training when used with common ML tools. This enabled ML training to be done directly from ROOT datasets avoiding the need for intermediate data conversions, streamlining workflows and used in the case where the training data does not fit in memory. In this contribution an end-to-end physics analysis is carried out to show how it can be used when training a model with common ML tools with ATLAS Open Data as input datasets stored in the new on-disk ROOT format called RNTuple.

Experiment context, if any	ATLAS

Martin Foll (University of Oslo (NO)) Dr Vincenzo Eduardo Padulano (CERN) Danilo Piparo (CERN) Prof. Farid Ould-Saada (University of Oslo (NO)) Dr Eirik Gramstad (University of Oslo (NO)) James Catmore (University of Oslo (NO))

ACAT-2025_Zero-overhead-ML-training-with-ROOT-in-an-ATLAS-Open-Data-Analysis.pdf

ACAT 2025

Zero-overhead ML training with ROOT in an ATLAS Open Data analysis

ESA W 'West Wing'

Speaker

Description

References

Significance

Authors

Presentation materials

Choose timezone

ACAT 2025

Speaker

Description

References

Significance

Authors

Presentation materials