ACAT 2025

Name: ACAT 2025
Start: 2025-09-08T08:00:00+02:00
End: 2025-09-12T16:30:00+02:00
Location: Hamburg, Germany

8–12 Sept 2025

Hamburg, Germany

Europe/Berlin timezone

ColliderML: The First Release of an OpenDataDetector High-Luminosity Physics Benchmark Dataset

8 Sept 2025, 11:00

30m

ESA W 'West Wing'

Poster Track 2: Data Analysis - Algorithms and Tools Poster session with coffee break

Daniel Thomas Murnane (Niels Bohr Institute, University of Copenhagen)

Particle physics is a field hungry for high quality simulation, to match the precision with which data is gathered at collider experiments such as the Large Hadron Collider (LHC). The computational demands of full detector simulation often lead to the use of faster but less realistic parameterizations, potentially compromising the sensitivity, generalizability, and robustness of downstream machine learning (ML) models. To address this, we introduce the OpenDataDetector High-Luminosity Physics Benchmark Dataset 2025, aka “ColliderML”. It includes O(1 million) realistically simulated and digitised high-pileup collision events, across O(10) important SM and BSM channels. A variety of objects are available, from energy deposit information in the tracker and calorimeters, up to reconstructed tracks and jets, as well as a large dataset of particle gun simulations. The OpenDataDetector geometry itself provides a realistic combination of several next-generation detector technologies.

To demonstrate ColliderML's utility, we showcase multiple machine learning benchmarks that rigorously evaluate the performance and behavior of ML models trained under diverse collider conditions. These evaluations specifically examine critical ML aspects such as generalizability between fast and full simulation and across physics channels, the benefits of low-level and full-detector features, and robustness in handling complex and noisy collider data. Additionally, we provide an intuitive accompanying software library, streamlining dataset access and manipulation. As we find large ML models plateauing in performance on high-level physics objects, we propose ColliderML as an essential tool in exploring the next generation of ML on low-level collider data.

Significance

The largest full simulation dataset of experiment-agnostic low-level data was previously TrackML (https://www.kaggle.com/competitions/trackml-particle-identification), released in 2018 with 10k events. We intend to finally improve on this, with 100x more data, full detector coverage (calorimeter + tracker), better digitizations, and reconstructed objects. We believe this is a major milestone for low-level data ML studies in open data.

References

https://iopscience.iop.org/article/10.1088/1742-6596/2438/1/012110/pdf

Andreas Salzburger (CERN) Anna Zaborowska (CERN) Daniel Thomas Murnane (Niels Bohr Institute, University of Copenhagen) Minh-Tuan Pham (University of Wisconsin Madison (US)) Paul Gessinger (CERN)

Poster_draft_2.pdf

ACAT 2025

ColliderML: The First Release of an OpenDataDetector High-Luminosity Physics Benchmark Dataset

ESA W 'West Wing'

Speaker

Description

Significance

References

Authors

Presentation materials

Choose timezone

ACAT 2025

Speaker

Description

Significance

References

Authors

Presentation materials