Conference on Computing in High Energy and Nuclear Physics

Name: Conference on Computing in High Energy and Nuclear Physics
Start: 2024-10-19T08:00:00+02:00
End: 2024-10-25T18:30:00+02:00
Location: No location set

19–25 Oct 2024

Europe/Zurich timezone

Contact Program Chairs

chep2024-pc@cern.ch

On-the-fly data set joins and concatenations with ROOT RNTuple

23 Oct 2024, 14:24

18m

Large Hall A

Talk Track 5 - Simulation and analysis tools Parallel (Track 5)

Florine de Geus (CERN/University of Twente (NL))

With the large data volume increase expected for HL-LHC and the even more complex computing challenges set by future colliders, the need for more elaborate data access patterns will become more pressing. ROOT’s next-generation data format and I/O subsystem, RNTuple, is designed to address those challenges, currently already showing a clear improvement in storage and I/O efficiency with respect to its predecessor, TTree. These improvements provide a solid baseline to introduce extensions that directly target common HENP workflow features not easily achievable before. Notably, many workflows benefit from the ability to join and concatenate data sets during application runtime, with the aim to reduce overall storage requirements and improve application ergonomics. The successful implementation of such compositions requires taking several factors into careful consideration, especially for large data sets that do not fit in memory. These factors include the transparent handling of (in)compatibility between different data sets, the rules that determine how data set compositions are processed, and their effects on runtime performance. In this contribution, we will present the ongoing work to support advanced composition of RNTuple data sets. We will discuss the main design considerations through a selection of concrete workflow use cases, the interfaces and internal machinery that enable the compositions, and an initial set of performance evaluation results.

Florine de Geus (CERN/University of Twente (NL)) Dr Vincenzo Eduardo Padulano (CERN) Jakob Blomer (CERN) Philippe Canal (Fermi National Accelerator Lab. (US)) Ana-Lucia Varbanescu (University of Twente)

CHEP2024_RNTupleConcatenationsJoins.pdf

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

On-the-fly data set joins and concatenations with ROOT RNTuple

Large Hall A

Speaker

Description

Authors

Presentation materials

Choose timezone

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Speaker

Description

Authors

Presentation materials