Awkward Array is a stable and widely used Python library for working with nested, variable-length, and irregular data โ the kind of data that traditional NumPy arrays canโt easily handle. Originally developed for high-energy physics, it has grown into a reliable tool for many fields beyond HEP.
Today, Awkward Array offers strong integration with libraries like NumPy, Numba, JAX, and GPU...
High-energy physics (HEP) analyses frequently manage massive datasets that surpass available computing resources, requiring specialized techniques for efficient data handling. [Awkward Array][1], a widely adopted Python library in the HEP community, effectively manages complex, irregularly structured ("ragged") data by mapping flat arrays into nested structures that intuitively represent...
As we pursue new physics at the LHC, the challenge of efficiently analyzing our rapidly mounting data volumes will continue to grow. This talk will describe the development and benchmarking of a realistic columnar-based end-user analysis workflow (for skimming Run 2 + Run 3 scale data with the Coffea framework) in order to characterize the current capabilities and understand bottlenecks as we...
The rootfilespec package is designed to efficiently parse ROOT file binary data into python datastructures. It does not drive I/O and expects materialized bytes buffers as input. It also does not return any types beyond python dataclasses of primitive types (and numpy arrays thereof). The goal of the project is to provide a stable and feature-complete read/write backend for packages such as uproot.
RNTuple is an new columnar data storage format with a variety of improvements over TTree. The first stable version of the specification became available in 6.34, at the beginning of the year. Thus, we have entered the transition period where our software migrates from TTrees to RNTuples. The Uproot Python library has stayed in the forefront of this transition, and already has fairly...
Binned Likelihoods (and optimizations of thereof) in HEP offer various parallelization opportunities. This talk discusses those opportunities, and discusses how they can be implemented using the JAX package. Finally, the evermore package is presented as a show-case that enables those optimizations with JAX already.
While advancements in software development practices across particle physics and adoption of Linux container technology have made substantial impact in the ease of replicability and reuse of analysis software stacks, the underlying software environments are still primarily bespoke builds that lack a full manifest to ensure reproducibility across time. The [HEP Packaging...
This talk covers histogram serialization development. We'll take a look at the new serialization specification being developed in UHI, we'll look at how libraries can be developed to support serialization (such as boost-histogram), and work through some examples.
This is intended to be an introduction to serialization so that it can be a hackathon/sprint target later.
In the past year, development in Julia has lead to the ability to statically compile small (relative to full runtime and LLVM) binaries.
In this presentation we quickly go over the basic principle of it, the challenge of it, and demonstrate a proof-of-concept binding to FHist.jl.
Finally, we discuss what are some potential future usage as well as the on-going development in larger...
The development of scientific data analyses is a resource-intensive process that often yields results with untapped potential for reuse and reinterpretation. In many cases, a developed analysis can be used to measure more than it was designed for, by changing its input data or parametrization. Building on the RECAST approach, which enables the reinterpretation of a physics analysis in the...
Statistical procedures at the end stages of analysis such as hypothesis testing. likelihood scans, and pull plots are currently implemented across multiple Python packages, yet lack interoperability despite performing similar functions once the log-likelihood is constructed. We present a contribution to HEPStats of the Scikit-HEP ecosystem to provide a common interface for these final stages...
Statistical tooling in the scientific python ecosystem continues to advance, while at the same time ROOT has recently adopted the HEP Statistics Serialization Standard (HS3) as the way of serializing RooWorkspaces for any probability model that has been built. There is a gap between packages such as jax and scipy.stats and what HS3 provides. This is where pyhs3 comes inโa modern...
Current statistical inference tools in high-energy physics typically focus on binned analyses and often use asymptotic approximations to draw statistical inferences. However, present and future neutrinoless double beta decay experiments, such as the Large Enriched Germanium Experiment for Neutrinoless ฮฒฮฒ Decay (LEGEND), operate in a quasi-background free regime, where the expected number of...
Is it possible for all individual collaboration software to be packaged and maintained on conda-forge? There are lots of caveats involved from the non-technical aspects including licensing and usage; to technical aspects such as cross-compilation and the larger number of dependencies and configuration / parallel releases that may make this challenging. The collaborations I am thinking about...