25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

Exploring lossy storage for analysis data with ROOT's RNTuple

26 May 2026, 16:33
18m
Chulalongkorn University

Chulalongkorn University

Oral Presentation Track 3 - Offline data processing Track 3 - Offline data processing

Speaker

Florine Willemijn de Geus (CERN/University of Twente (NL))

Description

With the data deluge that is expected to come with the High-Luminosity LHC and limited storage resources, the need to reduce the on-disk file size of High-Energy Physics (HEP) data becomes even more pressing. Lossless compression algorithms and encodings are already extensively used across all experiments data tiers, leading to often significant reductions of the total on-disk data volume for the collaboration. However, the aforementioned future storage challenges naturally lead to the question of whether more could be done. One potential next step to reduce data volumes even further is the use of lossy encoding schemes to store physics analysis data. The challenge with this approach, however, is the inherent loss in precision and (perceived) lack of predictability on its effects. In this contribution, we explore the impact of lossy compression on HEP data stored in ROOT's new RNTuple data format, which offers fine-grained mechanisms for low-precision data storage. We do this by evaluating different lossy encodings applied on a selection of particle quantities, and mapping out their effects on an open-data based analysis. With this evaluation, we aim to help the community in making informed decisions on the use of lossy compression for their use case.

Author

Florine Willemijn de Geus (CERN/University of Twente (NL))

Co-authors

Dr Vincenzo Eduardo Padulano (CERN) Jakob Blomer (CERN) Ana-Lucia Varbanescu (University of Twente) Philippe Canal (Fermi National Accelerator Lab. (US))

Presentation materials

There are no materials yet.