25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

Efficient Data Layouts and heterogeneous data handling in CMSSW

26 May 2026, 16:51
18m
Chulalongkorn University

Chulalongkorn University

Oral Presentation Track 3 - Offline data processing Track 3 - Offline data processing

Speaker

Felice Pantaleo (CERN)

Description

The High-Luminosity LHC will vastly increase both the volume and complexity of data to be processed within the CMS software framework (CMSSW), pushing computational throughput to its limits. Efficient use of accelerator hardware, especially GPUs, will be central to sustaining reconstruction and analysis performance under these conditions. Among the most impactful design choices for GPU-accelerated workloads is data layout, as memory-access patterns strongly influence the achievable level of coalesced reads and overall hardware utilization. Structure-of-Arrays (SoA) layouts naturally align with these requirements thanks to their contiguous, field-wise organization.

In this work, we present a generic and extensible SoA backend based on the Boost Preprocessor library, enabling highly portable and strongly typed data representations. The new system introduces MultiView, a mechanism that groups multiple SoA collections with identical schemas into a single logical entity. This abstraction removes the need for costly data reshaping, streamlines inter-module communication, and simplifies the design of downstream algorithms. A key outcome of this design is seamless interoperability with ML frameworks like PyTorch and SOFIE: SoA structures can be directly exposed as Tensors without transformation or memory copies, enabling fast heterogeneous inference workflows where machine-learning models operate natively on CMSSW event data.

Beyond in-memory layout optimization, we also investigate integration of NVIDIA GPUDirect Storage (GDS) to establish a direct, high-bandwidth I/O path between GPU memory and local or remote storage. By relieving the CPU of data-movement responsibilities, GDS has the potential to reduce latency and improve performance in I/O-bound workflows, an increasingly relevant challenge as CMS moves toward HL-LHC data rates.

Bibliography:
[1] M. Holzer, L. Beltrame, A. Bocci, F. Pantaleo, and S. Balducci, "User Story: Integration of ROOT RNTuple to CMSSW's SoA data structures," Nov. 2025.
[2] L. Beltrame, F. Pantaleo, A. Bocci, and E. Cano, "Evolution of Data Structures for Heterogeneous Reconstruction in CMSSW," 2025. doi: 10.17181/kd13h-42e08.

Author

Presentation materials

There are no materials yet.