19–25 Oct 2024
Europe/Zurich timezone

Direct I/O for RNTuple Columnar Data

21 Oct 2024, 13:30
18m
Room 1.B (Medium Hall B)

Room 1.B (Medium Hall B)

Talk Track 3 - Offline Computing Parallel (Track 3)

Speaker

Jonas Hahnfeld (CERN & Goethe University Frankfurt)

Description

RNTuple is the new columnar data format designed as the successor to ROOT's TTree format. It allows to make use of modern hardware capabilities and is expected to be used in production by the LHC experiments during the HL-LHC. In this contribution, we will discuss the usage of Direct I/O to fully exploit modern SSDs, especially in the context of the recent addition of parallel RNTuple writing. In contrast to buffered I/O where files are accessed via the operating system's page cache, Direct I/O circumvents all caching by the kernel and thereby enables higher bandwidths. However, to achieve this advantage, Direct I/O imposes strict alignment requirements on the I/O requests sent to the operating system: In particular, file offsets, byte counts and userspace buffer addresses must be aligned appropriately. This is challenging for columnar data formats and RNTuple pages that have variable size after compression. We will discuss possible strategies and performance results for both synthetic benchmarks as well as real-world applications.

Primary authors

Jonas Hahnfeld (CERN & Goethe University Frankfurt) Jakob Blomer (CERN) Philippe Canal (Fermi National Accelerator Lab. (US)) Thorsten Kollegger (Goethe University Frankfurt)

Presentation materials