PyHEP.dev 2025 - "Python in HEP" Developer's Workshop

Name: PyHEP.dev 2025 - "Python in HEP" Developer's Workshop
Start: 2025-07-14T07:30:00-07:00
End: 2025-07-17T13:00:00-07:00
Location: Seattle, Washington

14–17 Jul 2025

Seattle, Washington

US/Pacific timezone

Contact

pyhepdev2025-organisation@cern.ch

Lazy Data Loading with "Virtual Arrays" in Awkward

14 Jul 2025, 10:35

20m

Seattle, Washington

University of Washington

Talks

Iason Krommydas (Rice University (US))

High-energy physics (HEP) analyses frequently manage massive datasets that surpass available computing resources, requiring specialized techniques for efficient data handling. Awkward Array, a widely adopted Python library in the HEP community, effectively manages complex, irregularly structured ("ragged") data by mapping flat arrays into nested structures that intuitively represent physical objects like particles and their associated properties. Typically, analyses utilize only specific subsets of these objects and properties, presenting an important opportunity to reduce memory usage through lazy data loading strategies.

In this presentation, we will introduce and delve into Awkward Array's newly developed "Virtual Arrays" feature, explicitly designed for lazy loading of data buffers. Instead of immediately loading entire datasets into memory, Virtual Arrays defer data retrieval from disk until explicitly requested by computation. We will discuss in greater detail the underlying architecture, design considerations, and practical implementation of Virtual Arrays, highlighting their integration into analytical workflows.

We will illustrate how developers and analysts can seamlessly incorporate lazy data loading into their existing frameworks using Coffea—the Columnar Object Framework For Effective Analysis. Coffea facilitates efficient event data processing through columnar operations and transparently scales computations from personal laptops to extensive distributed computing environments without modifications to analysis code. Real-world examples from high-energy physics, including selective data processing and efficient histogramming, will underscore the technical implications and significant performance improvements provided by Virtual Arrays, accelerating data-intensive analysis and enhancing computational efficiency in collider experiments.

Iason Krommydas (Rice University (US))

Ianna Osborne (Princeton University) Manfred Peter Fackeldey (Princeton University (US))

2025Jul14_Krommydas_PyHEPdev_virtual_arrays.pdf

PyHEP.dev 2025 - "Python in HEP" Developer's Workshop

Contact

Lazy Data Loading with "Virtual Arrays" in Awkward

Seattle, Washington

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

PyHEP.dev 2025 - "Python in HEP" Developer's Workshop

Contact

Speaker

Description

Author

Co-authors

Presentation materials