PyHEP 2025 - "Python in HEP" Users Workshop (hybrid), CERN

Name: PyHEP 2025 - "Python in HEP" Users Workshop (hybrid), CERN
Start: 2025-10-27T13:00:00+01:00
End: 2025-10-30T22:00:00+01:00
Location: CERN

27–30 Oct 2025

CERN

Europe/Zurich timezone

Contact us

hsf-pyhep-organisation@googlegroups.com

Contribution List

18. Welcome and workshop overview

Ianna Osborne (Princeton University)

27/10/2025, 14:00

Plenary Session Monday (1)

Please join Zulip chat:
https://pyhep2025.zulipchat.com/join/z4trmz2ufs7wqde2bcsy2mph/

24. ROOT's Newest Pythonizations: UHI, RDataFrame and More

Silia Taider (CERN)

27/10/2025, 14:10

Lightning talk

Plenary Session Monday (1)

The ROOT software package features automatic and dynamic Python bindings that provide access to its powerful and performant C++ core. With the growing adoption of Python in the HEP community, ROOT continues to evolve to offer a more intuitive and Pythonic user experience.

Recent developments make key components of the framework more accessible and interoperable from Python. This includes...

4. RNTuple and Uproot

Andres Rios-Tascon (Princeton University)

27/10/2025, 14:20

"Standard talk"

Plenary Session Monday (1)

RNTuple is a new columnar data storage format with a variety of improvements over TTree. The first stable version of the specification became available earlier this year, so the transition to RNTuples has now begun. The Uproot Python library aims to provide a much better support for reading and writing RNTuples than it did for TTrees, thanks to its modern and simple design. Uproot already...

22. uproot-custom: customize branch data reading in Uproot

Mingrun Li (IHEP, CAS)

27/10/2025, 14:50

Lightning talk

Plenary Session Monday (1)

uproot-custom is an extension of Uproot that allows users to define custom behaviors when reading branch data from ROOT files. This capability is particularly useful when handling classes with overloaded Streamer methods or when specific data transformations are required during the reading process. By being implemented in C++ and Python, uproot-custom ensures both high performance and...

7. Simulations, Post-Processing and Visualisations of Detector Cooling Systems

Viren Bhanot (CERN)

27/10/2025, 15:00

"Standard talk"

Plenary Session Monday (1)

CO2-based two-phase pumped loop systems are now the de-facto solution for Detector cooling at CERN. The scope of these systems grows ever larger, and with it, so does the complexity of the underlying technology

For the past decade-and-a-half, Matlab has been our one-stop shop for simulations, post-processing, data analyses and data visualisations. Recently, we have begun a piecemeal...

16. ISpy NanoAOD: An event display for the NanoAOD format of the CMS Experiment

Thomas McCauley (University of Notre Dame (US))

27/10/2025, 15:30

Lightning talk

Plenary Session Monday (1)

The CMS Experiment introduced a new lightweight format for physics analysis, NanoAOD, during Run 2. Stored as ROOT TTrees, NanoAOD can be read directly with ROOT or with Python libraries such as uproot. Current CMS event displays rely on the larger MiniAOD data tier, which requires CMS-specific software and resources and includes information not available in NanoAOD.
ISpy NanoAOD is a...

20. PyLHE in 2025: New features and improvements

Alexander Puck Neuwirth (UNIMIB & INFN)

27/10/2025, 15:40

Lightning talk

Plenary Session Monday (1)

The PyLHE library (Python LHE interface) has seen major improvements since 2024. Recent releases introduced LHE file writing (v0.9.0) and extended event weight support for POWHEG (v0.8.0). Event weights, when available, are now included in the output Awkward Arrays, and systematic tests are performed using LHE files from widely used general-purpose Monte Carlo event generators. In addition to...

13. Coffea schemas modifications

Iason Krommydas (Rice University (US)), Maksym Naumchyk

27/10/2025, 15:50

Lightning talk

Plenary Session Monday (1)

This talk explores the results of my recent project as an IRIS-HEP fellow. I was working on improving the coffea schemas by simplifying how they work internally. It eventually transitioned into making a new package that would include all the simplified schemas, separated from coffea. Eventually coffea will use them instead of its old schemas. This new package was given the name 'zipper' and...

12. Reproducible reuse by default: Use of Pixi for software in (Py)HEP

Matthew Feickert (University of Wisconsin Madison (US))

27/10/2025, 16:30

"Long talk"

Plenary Session Monday (2)

While advancements in software development practices across particle physics and adoption of Linux container technology have made substantial impact in the ease of replicability and reuse of analysis software stacks, the underlying software environments are still primarily bespoke builds that lack a full manifest to ensure reproducibility across time. Pixi is a new...

29. Histogram Serialization

Henry Fredrick Schreiner (Princeton University)

27/10/2025, 17:20

"Standard talk"

Plenary Session Monday (2)

This talk covers histogram serialization that has been added to the latest versions of boost-histogram, hist, and uhi. We'll see how you can serialize and deserialize histograms to multiple formats. We'll also look as related recent advancements, such as the new cross-library tests provided in uhi.

We'll take a deeper look at the new serialization specification that was developed in UHI,...

27. PyTrees for vectors

Nick Smith (Fermi National Accelerator Lab. (US))

27/10/2025, 17:50

Lightning talk

Plenary Session Monday (2)

PyTrees are a powerful mechanism for working with nested data structures, while allowing algorithms like finite-differences, minimization, and integration routines to run on flattened 1D arrays of the the same data. The Scikit-HEP vector package recently added pytree support through optree. In this lightning talk, we'll introduce pytrees, show an example of usage, and discuss opportunities for...

9. Up your ML game: an overview of Python ML tools relevant to HEP

Liv Helen Vage (Princeton University (US))

28/10/2025, 14:00

"Long talk"

Plenary Session Tuesday (3)

Machine learning is advancing at a breathtaking pace, and navigating the ever-growing ecosystem of Python tools can be time consuming. This talk offers a practical guide to the ML landscape most relevant to high-energy physics. We discuss:

Common ML frameworks including PyTorch, PyTorch Lightning, Keras, Jax, Scikit-learn - strengths and weaknesses and how to choose
**ML...

28. Zero-overhead ML training from Python with ROOT in an ATLAS Open Data analysis

Martin Foll (University of Oslo (NO))

28/10/2025, 14:50

Lightning talk

Plenary Session Tuesday (3)

The ROOT software framework is widely used from Python in HEP for storage, processing, analysis and visualization of large datasets. With the large increase in usage of ML from the Python ecosystem for experiment workflows, especially lately in the last steps of the analysis pipeline, the matter of exposing ROOT data ergonomically to ML models becomes ever more pressing. In this contribution...

14. From ROOT to PyTorch: Seamless Data Pipelines with HDF5

Jan Gavranovic (Jozef Stefan Institute (SI))

28/10/2025, 15:00

Lightning talk

Plenary Session Tuesday (3)

High Energy Physics analyses frequently rely on large-scale datasets stored in ROOT format, while modern machine learning workflows are increasingly built around PyTorch and its data pipeline abstractions. This disconnect between domain-specific storage and general-purpose ML frameworks creates a barrier to efficient end-to-end workflows.

We introduce F9columnar...

5. Efficient Statistical Modeling for Particle Physics Using Computational Graphs in Python

Dr Giordon Holtsberg Stark (University of California,Santa Cruz (US))

28/10/2025, 15:30

"Standard talk"

Plenary Session Tuesday (3)

Statistical modeling is central to discovery in particle physics, yet the tools commonly used to define, share, and evaluate these models are often complex, fragmented, or tightly coupled to legacy systems. In parallel, the scientific Python community has developed a variety of statistical modeling tools that have been widely adopted for their performance and ease of use, but remain...

8. Smart Gradient-Based Optimization of HEP Analyses

Lino Oscar Gerlach (Princeton University (US)), Mohamed Aly (Princeton University (US))

28/10/2025, 16:30

"Long talk"

Plenary Session Tuesday (4)

Automatic differentiation, the technique behind modern deep learning, can be applied more broadly in High Energy Physics (HEP) to make entire analysis pipelines differentiable. This enables direct optimization of analysis choices such as selection thresholds, binning strategies, and systematic treatments by propagating gradients through the statistical analysis chain.

This talk will...

23. evermore: differentiable binned likelihoods in JAX

Felix Philipp Zinn (Rheinisch Westfaelische Tech. Hoch. (DE)), Manfred Peter Fackeldey (Princeton University (US))

28/10/2025, 17:20

"Standard talk"

Plenary Session Tuesday (4)

evermore is a software package for statistical inference using likelihood
functions of binned data. It fulfils three key concepts: performance,
differentiability, and object-oriented statistical model building.
evermore is build on JAX - a powerful autodifferentiation Python frame-
work. By making every component in evermore a “PyTree”, each compo-
nent can be jit-compiled (jax.jit),...

1. Efficient binned profile likelihood maximization with Rabbit

David Walter (Massachusetts Inst. of Technology (US))

28/10/2025, 17:40

Plenary Session Tuesday (4)

The High-Luminosity LHC era will deliver unprecedented data volumes, enabling measurements on fine-grained multidimensional histograms containing millions of bins with thousands of events each. Achieving ultimate precision requires modeling thousands of systematic uncertainty sources, creating computational challenges for likelihood maximization and inference. Fast optimization is crucial for...

2. FLARE: FCCee b2Luigi Automated Reconstruction And Event processing

Cameron Harris

29/10/2025, 14:00

Plenary Session Wednesday (5)

The FCCee b2Luigi Automated Reconstruction And Event processing (FLARE) package is an open source python based data workflow orchestration tool powered by b2luigi. FLARE automates the workflow of Monte Carlo (MC) generators inside the Key4HEP stack such as Whizard, MadGraph5_aMC@NLO, Pythia8 and Delphes. FLARE also automates the Future Circular Collider (FCC) Physics Analysis software...

25. PocketCoffea: Configuration Framework for CMS Analyses based on Coffea

Felix Philipp Zinn (Rheinisch Westfaelische Tech. Hoch. (DE))

29/10/2025, 14:30

"Standard talk"

Plenary Session Wednesday (5)

PocketCoffea is an analysis framework based on Coffea for CMS NanoAOD events. It relies on a BaseProcessor class which processes the NanoAOD files in a columnar fashion.
PocketCoffea defines a Configurator class to handle parameter, analysis workflow configurations such as datasets definition, object and event selection, event weights, systematic uncertainties and output histogram...

11. Workflows via ParaO - Parametrizing Objects

Benjamin Fischer (RWTH Aachen University (DE))

29/10/2025, 15:00

"Standard talk"

Plenary Session Wednesday (5)

Luigi is a powerful workflow tool for data analyses. Yet, it has some limitations that become quite debilitating in larger and more complex workflows. The PyHEP.dev 2024 Talk waluigi - Beyond luigi outlined some basic principles and ideas that sought to address these shortcomings. Together with the feedback gathered from the...

10. Scattering Amplitude Reconstruction in Python

Giuseppe De Laurentis (University of Edinburgh)

29/10/2025, 15:30

"Standard talk"

Plenary Session Wednesday (5)

Scattering amplitudes encode the chances of different outcomes when
particles collide. Calculating them to the precision required by
current and future colliders is extremely challenging: the
intermediate steps explode in size and become unwieldy even for modern
computers. Yet the final answers often turn out to be surprisingly
simple and efficient to use, if only they can be...

17. Pythonic GPU Parallelism for HEP with cuda-cccl

Ashwin Srinath

29/10/2025, 16:30

"Long talk"

Plenary Session Wednesday (6)

High-energy physics analyses involve complex computations over large, irregular, nested data structures. Libraries such as Awkward Array have demonstrated that the massive parallelism of GPUs can be applied to accelerate these analyses. However, today this requires significant expertise from both library developers and end users, who must navigate the low-level details of CUDA kernel...

26. Accelerating scientific Python code with dispatching: Graphs and Arrays

Aditi Juneja

29/10/2025, 17:30

Lightning talk

Plenary Session Wednesday (6)

Modern high-energy physics workflows rely heavily on large-scale computation, where performance bottlenecks often emerge as data sizes grow. This talk explores various dispatching mechanisms incorporated in libraries like NetworkX (Graphs), NumPy (Arrays) and scikit-image,...

21. Reviving Formulate

Andres Rios-Tascon (Princeton University)

29/10/2025, 17:40

Lightning talk

Plenary Session Wednesday (6)

The formulate Python package was released in 2018 aimed to be a translation tool between the C++ expressions used ROOT and the Python counterparts used in the Scikit-HEP ecosystem. It worked well for simple expressions, but had serious performance issues when expressions were lengthy and complex. Last year, there was an effort to rewrite the package from scratch to solve these performance...

32. Awkward Array Internals: A Hands-On Hacking Session

Ianna Osborne (Princeton University)

30/10/2025, 09:00

Awkward Array Internals: A Hands-On Hacking Session

We’re planning a hands-on session to explore Awkward Array’s internals, contribute to development, or just learn how it works.

Vote for what you’d like to focus on: GitHub poll link

Options include array internals, performance hacks, GPU/Numba integration, extending Awkward, debugging, interoperability, or just learning the basics.

3. Coffea Framework: Current Status and Recent Updates

Iason Krommydas (Rice University (US))

30/10/2025, 14:00

"Long talk"

Plenary Session Thursday

This tutorial will provide a comprehensive introduction to the current state of Coffea (Columnar Object Framework for Effective Analysis), focusing on its transition to virtual arrays as the primary backend for efficient HEP data processing. With the introduction of Awkward Array's Virtual Arrays feature, Coffea now offers lazy data loading capabilities that dramatically reduce memory...

6. (Super)powering ntuple analysis with coffea for ATLAS

Dr Giordon Holtsberg Stark (University of California,Santa Cruz (US))

30/10/2025, 14:50

"Standard talk"

Plenary Session Thursday

ATLAS analysis in Run 2 was chaotic. ATLAS Run 3 and beyond has started to consolidate to a few common frameworks that are maintained more centrally. The top two most popular analysis frameworks are currently TopCPToolkit and easyjet. Both are configurable with yaml, while the former is part of ATLAS's offline software (athena) and the latter is developed primarily for use by higgs/di-higgs...

15. Wrangling Massive Task Graphs with Dynamic Hierarchical Composition

Benjamin Tovar Lopez (University of Notre Dame)

30/10/2025, 15:20

"Standard talk"

Plenary Session Thursday

Data analysis in High Energy Physics is constrained by the scalability of systems that rely on a single, static workflow graph. This representation is rigid, struggles with overhead when applied to workflows involving large data, and can be slow to construct (such as with Dask). To overcome this, we introduce Dynamic Data Reduction (DDR), built upon the common pattern in event processing. This...

31. Workshop close-out

Ianna Osborne (Princeton University)

30/10/2025, 15:50

Plenary Session Thursday

Choose timezone

PyHEP 2025 - "Python in HEP" Users Workshop (hybrid), CERN

Contact us