PyHEP.dev 2025 - "Python in HEP" Developer's Workshop

Name: PyHEP.dev 2025 - "Python in HEP" Developer's Workshop
Start: 2025-07-14T07:30:00-07:00
End: 2025-07-17T13:00:00-07:00
Location: Seattle, Washington

14–17 Jul 2025

Seattle, Washington

US/Pacific timezone

Contact

pyhepdev2025-organisation@cern.ch

Using Commodity Data Tools in LEGEND-1000

15 Jul 2025, 11:50

20m

Seattle, Washington

University of Washington

Talks

Isaac Kenneth Kunen

The current phase of the LEGEND neutrinoless double-beta decay search, LEGEND-200, holds its primary experimental data in a customized HDF5 format, This requires the team to build and maintain a significant custom data access layer that lies outside the team’s core physics mission and expertise, and the performance and complexity of the system impacts both data production pipelines and analysis of the data.

Multi-petabyte data sets like those LEGEND will amass used to be outliers, but are now common in industry, and the database community has produced a wealth of tools for dealing with them. For the future phase of the project, LEGEND-1000, we’re exploring how we can improve performance and functionality, while reducing cost to the team by leveraging these tools.

In this discussion, we’ll give an overview of our early work to use vanilla Parquet in conjunction with HIVE Partitioning (and possibly Iceberg) for storage, off-the-shelf data access and coordination systems in Python like DuckDB and PySpark to process and query data, and standard OCI containers to simplify deployment across environments.

Isaac Kenneth Kunen

PyHEP-LEGEND1000.pdf

PyHEP-LEGEND1000.pptx

PyHEP.dev 2025 - "Python in HEP" Developer's Workshop

Contact

Using Commodity Data Tools in LEGEND-1000

Seattle, Washington

Speaker

Description

Author

Presentation materials

Choose timezone

PyHEP.dev 2025 - "Python in HEP" Developer's Workshop

Contact

Speaker

Description

Author

Presentation materials