Please join Zulip chat:
https://pyhep2025.zulipchat.com/join/z4trmz2ufs7wqde2bcsy2mph/
The ROOT software package features automatic and dynamic Python bindings that provide access to its powerful and performant C++ core. With the growing adoption of Python in the HEP community, ROOT continues to evolve to offer a more intuitive and Pythonic user experience.
Recent developments make key components of the framework more accessible and interoperable from Python. This includes...
RNTuple is a new columnar data storage format with a variety of improvements over TTree. The first stable version of the specification became available earlier this year, so the transition to RNTuples has now begun. The Uproot Python library aims to provide a much better support for reading and writing RNTuples than it did for TTrees, thanks to its modern and simple design. Uproot already...
uproot-custom is an extension of Uproot that allows users to define custom behaviors when reading branch data from ROOT files. This capability is particularly useful when handling classes with overloaded Streamer methods or when specific data transformations are required during the reading process. By being implemented in C++ and Python, uproot-custom ensures both high performance and...
CO2-based two-phase pumped loop systems are now the de-facto solution for Detector cooling at CERN. The scope of these systems grows ever larger, and with it, so does the complexity of the underlying technology
For the past decade-and-a-half, Matlab has been our one-stop shop for simulations, post-processing, data analyses and data visualisations. Recently, we have begun a piecemeal...
The CMS Experiment introduced a new lightweight format for physics analysis, NanoAOD, during Run 2. Stored as ROOT TTrees, NanoAOD can be read directly with ROOT or with Python libraries such as uproot. Current CMS event displays rely on the larger MiniAOD data tier, which requires CMS-specific software and resources and includes information not available in NanoAOD.
ISpy NanoAOD is a...
The PyLHE library (Python LHE interface) has seen major improvements since 2024. Recent releases introduced LHE file writing (v0.9.0) and extended event weight support for POWHEG (v0.8.0). Event weights, when available, are now included in the output Awkward Arrays, and systematic tests are performed using LHE files from widely used general-purpose Monte Carlo event generators. In addition to...
This talk explores the results of my recent project as an IRIS-HEP fellow. I was working on improving the coffea schemas by simplifying how they work internally. It eventually transitioned into making a new package that would include all the simplified schemas, separated from coffea. Eventually coffea will use them instead of its old schemas. This new package was given the name 'zipper' and...
While advancements in software development practices across particle physics and adoption of Linux container technology have made substantial impact in the ease of replicability and reuse of analysis software stacks, the underlying software environments are still primarily bespoke builds that lack a full manifest to ensure reproducibility across time. Pixi is a new...
This talk covers histogram serialization that has been added to the latest versions of boost-histogram, hist, and uhi. We'll see how you can serialize and deserialize histograms to multiple formats. We'll also look as related recent advancements, such as the new cross-library tests provided in uhi.
We'll take a deeper look at the new serialization specification that was developed in UHI,...
PyTrees are a powerful mechanism for working with nested data structures, while allowing algorithms like finite-differences, minimization, and integration routines to run on flattened 1D arrays of the the same data. The Scikit-HEP vector package recently added pytree support through optree. In this lightning talk, we'll introduce pytrees, show an example of usage, and discuss opportunities for...
Machine learning is advancing at a breathtaking pace, and navigating the ever-growing ecosystem of Python tools can be time consuming. This talk offers a practical guide to the ML landscape most relevant to high-energy physics. We discuss:
- Common ML frameworks including PyTorch, PyTorch Lightning, Keras, Jax, Scikit-learn - strengths and weaknesses and how to choose
- **ML...
The ROOT software framework is widely used from Python in HEP for storage, processing, analysis and visualization of large datasets. With the large increase in usage of ML from the Python ecosystem for experiment workflows, especially lately in the last steps of the analysis pipeline, the matter of exposing ROOT data ergonomically to ML models becomes ever more pressing. In this contribution...
High Energy Physics analyses frequently rely on large-scale datasets stored in ROOT format, while modern machine learning workflows are increasingly built around PyTorch and its data pipeline abstractions. This disconnect between domain-specific storage and general-purpose ML frameworks creates a barrier to efficient end-to-end workflows.
We introduce F9columnar...
Statistical modeling is central to discovery in particle physics, yet the tools commonly used to define, share, and evaluate these models are often complex, fragmented, or tightly coupled to legacy systems. In parallel, the scientific Python community has developed a variety of statistical modeling tools that have been widely adopted for their performance and ease of use, but remain...
Automatic differentiation, the technique behind modern deep learning, can be applied more broadly in High Energy Physics (HEP) to make entire analysis pipelines differentiable. This enables direct optimization of analysis choices such as selection thresholds, binning strategies, and systematic treatments by propagating gradients through the statistical analysis chain.
This talk will...
evermore is a software package for statistical inference using likelihood
functions of binned data. It fulfils three key concepts: performance,
differentiability, and object-oriented statistical model building.
evermore is build on JAX - a powerful autodifferentiation Python frame-
work. By making every component in evermore a “PyTree”, each compo-
nent can be jit-compiled (jax.jit),...
The High-Luminosity LHC era will deliver unprecedented data volumes, enabling measurements on fine-grained multidimensional histograms containing millions of bins with thousands of events each. Achieving ultimate precision requires modeling thousands of systematic uncertainty sources, creating computational challenges for likelihood maximization and inference. Fast optimization is crucial for...
PocketCoffea is an analysis framework based on Coffea for CMS NanoAOD events. It relies on a BaseProcessor class which processes the NanoAOD files in a columnar fashion. 
PocketCoffea defines a Configurator class to handle parameter, analysis workflow configurations such as datasets definition, object and event selection, event weights, systematic uncertainties and output histogram...
Luigi is a powerful workflow tool for data analyses. Yet, it has some limitations that become quite debilitating in larger and more complex workflows. The PyHEP.dev 2024 Talk waluigi - Beyond luigi outlined some basic principles and ideas that sought to address these shortcomings. Together with the feedback gathered from the...
Scattering amplitudes encode the chances of different outcomes when
particles collide. Calculating them to the precision required by
current and future colliders is extremely challenging: the
intermediate steps explode in size and become unwieldy even for modern
computers. Yet the final answers often turn out to be surprisingly
simple and efficient to use, if only they can be...
High-energy physics analyses involve complex computations over large, irregular, nested data structures. Libraries such as Awkward Array have demonstrated that the massive parallelism of GPUs can be applied to accelerate these analyses. However, today this requires significant expertise from both library developers and end users, who must navigate the low-level details of CUDA kernel...
Modern high-energy physics workflows rely heavily on large-scale computation, where performance bottlenecks often emerge as data sizes grow. This talk explores various dispatching mechanisms incorporated in libraries like NetworkX (Graphs), NumPy (Arrays) and scikit-image,...
The formulate Python package was released in 2018 aimed to be a translation tool between the C++ expressions used ROOT and the Python counterparts used in the Scikit-HEP ecosystem. It worked well for simple expressions, but had serious performance issues when expressions were lengthy and complex. Last year, there was an effort to rewrite the package from scratch to solve these performance...
We’re planning a hands-on session to explore Awkward Array’s internals, contribute to development, or just learn how it works.
Vote for what you’d like to focus on: GitHub poll link
Options include array internals, performance hacks, GPU/Numba integration, extending Awkward, debugging, interoperability, or just learning the basics.
This tutorial will provide a comprehensive introduction to the current state of Coffea (Columnar Object Framework for Effective Analysis), focusing on its transition to virtual arrays as the primary backend for efficient HEP data processing. With the introduction of Awkward Array's Virtual Arrays feature, Coffea now offers lazy data loading capabilities that dramatically reduce memory...
ATLAS analysis in Run 2 was chaotic. ATLAS Run 3 and beyond has started to consolidate to a few common frameworks that are maintained more centrally. The top two most popular analysis frameworks are currently TopCPToolkit and easyjet. Both are configurable with yaml, while the former is part of ATLAS's offline software (athena) and the latter is developed primarily for use by higgs/di-higgs...
Data analysis in High Energy Physics is constrained by the scalability of systems that rely on a single, static workflow graph. This representation is rigid, struggles with overhead when applied to workflows involving large data, and can be slow to construct (such as with Dask). To overcome this, we introduce Dynamic Data Reduction (DDR), built upon the common pattern in event processing. This...