Conveners
Track 2: Data Analysis - Algorithms and Tools
- co-chair: Daniel Murnane
- chair: Frank Gaede
Track 2: Data Analysis - Algorithms and Tools
- chair: Davide Valsecchi
- co-chair: Daniel Murnane
Track 2: Data Analysis - Algorithms and Tools
- chair: Luisa Lucie-Smith
- co-chair: Louis Moureaux
Track 2: Data Analysis - Algorithms and Tools
- chair: Frank Gaede
- co-chair: Luisa Lucie-Smith
Track 2: Data Analysis - Algorithms and Tools
- chair: Tilman Plehn
- co-chair: Karim El Morabit
Track 2: Data Analysis - Algorithms and Tools
- co-chair: David Rousseau
- chair: Thea Aarrestad
Track 2: Data Analysis - Algorithms and Tools
- chair: Thea Aarrestad
- co-chair: Tilman Plehn
Track reconstruction is a cornerstone of modern collider experiments, and the HL-LHC ITk upgrade for ATLAS poses new challenges with its increased silicon hit clusters and strict throughput requirements. Deep learning approaches compare favorably with traditional combinatorial ones — as shown by the GNN4ITk project, a geometric learning tracking pipeline that achieves competitive physics...
Precision measurements of Higgs, W, and Z bosons at future lepton colliders demand jet energy reconstruction with unprecedented accuracy. The particle flow approach has proven to be an effective method for achieving the required jet energy resolution. We present CyberPFA, a particle flow algorithm specifically optimized for the particle-flow-oriented crystal bar electromagnetic calorimeter...
We present lightweight, attention-enhanced Graph Neural Networks (GNNs) tailored for real-time particle reconstruction and identification in LHCb’s next-generation calorimeter. Our architecture builds on node-centric GarNet layers, which eliminate costly edge message passing and are optimized for FPGA deployment, achieving sub-microsecond inference latency. By integrating attention mechanisms...
We present a versatile GNN-based end-to-end reconstruction algorithm for highly granular calorimeters that can include track and timing information to aid the reconstruction of particles. The algorithm starts directly from calorimeter hits and possibly reconstructed tracks, and outputs a coordinate transformation in which all shower objects are well separated from each other and assigned...
With the upcoming High-Luminosity upgrades at the LHC, data generation rates are expected to increase significantly. This calls for highly efficient architectures for machine learning inference in experimental workflows like event reconstruction, simulation, and data analysis.
At the ML4EP team at CERN, we have developed SOFIE, a tool within the ROOT/TMVA package that translates externally...
Particle physics experiments rely on the (generalised) likelihood ratio test (LRT) for searches and measurements. This is not guaranteed to be optimal for composite hypothesis tests, as the Neyman-Pearson lemma pertains only to simple hypothesis tests. An improvement in the core statistical testing methodology would have widespread ramifications across experiments. We discuss an alternate test...
Neural Simulation-Based Inference (NSBI) is a powerful class of machine learning (ML)-based methods for statistical inference that naturally handle high dimensional parameter estimation without the need to bin data into low-dimensional summary histograms. Such methods are promising for a range of measurements at the Large Hadron Collider, where no single observable may be optimal to scan over...
We present a modular, data-driven framework for calibration and performance correction in the ALICE experiment. The method addresses time- and parameter-dependent effects in high-occupancy heavy-ion environments, where evolving detector conditions (e.g., occupancy and cluster overlaps, gain drift, space charge, dynamic distortions, and reconstruction or calibration deficiencies) require...
Jiangmen Underground Neutrino Observatory (JUNO) is a next generation 20-kton liquid scintillator detector under construction in southern China. It is designed to determine neutrino mass ordering via the measurement of reactor neutrino oscillation, and also to study other physics topics including atmospheric neutrinos, supernova neutrinos and more. The detector's large mass and high...
The application of foundation models in high-energy physics has recently been proposed as a way to use large unlabeled datasets to efficiently train powerful task-specific models. The aim is to train a task-agnostic model on an existing large dataset such that the learned representation can later be utilized for subsequent downstream physics tasks.
The pretrained model can reduce the training...
OmniJet-alpha, the first cross-task foundation model for particle physics, was first presented at ACAT 2024. In its base configuration, OmniJet-alpha is capable of transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging). Since its release, we have also shown that it can sucessfully transfer from CMS Open data to simulation, and even...
Large backgrounds and detector aging impact the track finding in the Belle II central drift chamber, reducing both purity and efficiency in events. This necessitates the development of new track algorithms to mitigate detector performance degradation. Building on our previous success with an end-to-end multi-track reconstruction algorithm for the Belle II experiment at the SuperKEKB collider...
Detailed event simulation at the LHC is taking a large fraction of computing budget. CMS developed an end-to-end ML based simulation that can speed up the time for production of analysis samples of several orders of magnitude with a limited loss of accuracy. As the CMS experiment is adopting a common analysis level format, the NANOAOD, for a larger number of analyses, such an event...
The Matrix Element Method (MEM) offers optimal statistical power for hypothesis testing in particle physics, but its application is hindered by the computationally intensive multi-dimensional integrals required to model detector effects. We present a novel approach that addresses this challenge by employing Transformers and generative machine learning (ML) models. Specifically, we utilize ML...
Measurements of neutral, oscillating mesons are a gateway to quantum mechanics and give access to the fundamental interactions of elementary particles. For example, precise measurements of $CP$ violation in neutral $B$ mesons can be taken in order to test the Standard Model of particle physics. These measurements require knowledge of the $B$-meson flavour at the time of its production, which...
We construct Lorentz-equivariant transformer and graph networks using the concept of local canonicalization. While many Lorentz-equivariant architectures use specialized layers, this approach allows to take any existing non-equivariant architecture and make it Lorentz-equivariant using transformations with equivariantly predicted local frames. In addition, data augmentation emerges as a...
Modern machine learning (ML) algorithms are sensitive to the specification of non-trainable parameters called hyperparameters (e.g., learning rate or weight decay). Without guiding principles, hyperparameter optimization is the computationally expensive process of sweeping over various model sizes and, at each, re-training the model over a grid of hyperparameter settings. However, recent...
Deep generative models have become powerful tools for alleviating the computational burden of traditional Monte Carlo generators in producing high-dimensional synthetic data. However, validating these models remains challenging, especially in scientific domains requiring high precision, such as particle physics. Two-sample hypothesis testing offers a principled framework to address this task....
Charged track reconstruction is a critical task in nuclear physics experiments, enabling the identification and analysis of particles produced in high-energy collisions. Machine learning (ML) has emerged as a powerful tool for this purpose, addressing the challenges posed by complex detector geometries, high event multiplicities, and noisy data. Traditional methods rely on pattern recognition...
The Compton Spectrometer and Imager (COSI) is a NASA Small Explorer (SMEX) satellite mission planned to fly in 2027. It has the participation of institutions in the US, Europe and Asia and aims at the construction of a gamma-ray telescope for observations in the 0.2-5 MeV energy range. COSI consists of an array of germanium strip detectors cooled to cryogenic temperatures with millimeter...
Beyond the planet Neptune, only the largest solar system objects can be observed directly. However, there are tens of thousands of smaller objects whose frequency and distribution could provide valuable insights into the formation of our solar system - if we could see them.
Project SOWA (Solar-system Occultation Watch and Analysis) aims to systematically search for such invisible objects...
The next generation of ground-based gamma-ray astronomy instruments will involve arrays of dozens of telescopes, leading to an increase in operational and analytical complexity. This scale-up poses challenges for both system operations and offline data processing, especially when conventional approaches struggle to scale effectively. To address these challenges, we are developing AI agents...
In many domains of science the likelihood function is a fundamental ingredient used to statistically infer model parameters from data, due to the likelihood ratio (LR) as an optimal test statistic. Neural based LR estimation using probabilistic classification has therefore had a significant impact in these domains, providing a scalable method for determining an intractable LR from simulated...
In anticipation of higher luminosities at the Belle II experiment, high levels of beam background
from outside of the interaction region are expected. To prevent track trigger rates
from surpassing the limitations of the data acquisition system, an upgrade of the first-level
neural track trigger becomes indispensable. This upgrade contains a novel track finding
algorithm based on...
The LHCb experiment at the Large Hadron Collider (LHC) operates a fully software-based trigger system that processes proton-proton collisions at a rate of 30 MHz, reconstructing both charged and neutral particles in real time. The first stage of this trigger system, running on approximately 500 GPU cards, performs a track pattern recognition to reconstruct particle trajectories with low...
The upgraded LHCb experiment is pioneering the landscape of real-time data-processing techniques using an heterogeneous computing infrastructure, composed of both GPUs and FPGAs, aimed at boosting the performance of the HLT1 reconstruction. Amongst the novelties in the reconstruction infrastructure made for the Run 3, the introduction of a real-time VELO hit-finding FPGA-based architecture...
Charged particle track reconstruction is one the heaviest computational tasks in the event reconstruction chain at Large Hadron Collider (LHC) experiments. Furthermore, projections for the High Luminosity LHC (HL-LHC) show that the required computing resources for single-threaded CPU algorithms will exceed those that are expected to be available. It follows that experiments at the HL-LHC will...
The exponential time scaling of traditional primary vertex reconstruction algorithms raises significant performance concerns for future high-pileup environments, particularly with the upcoming High Luminosity upgrade to the Large Hadron Collider. In this talk, we introduce PV-Finder, a deep learning-based approach that leverages reconstructed track parameters to directly predict primary vertex...
Unfolding detector-level data into meaningful particle-level distributions remains a key challenge in collider physics, especially as the dimensionality of the relevant observables increases. Traditional unfolding techniques often struggle with such high-dimensional problems, motivating the development of machine learning-based approaches.We introduce a new method for generative unfolding that...
Two shortcomings of classical unfolding algorithms, namely that they are defined on binned, one-dimensional observables, can be overcome when using generative machine learning. Many studies on generative unfolding reduce the problem to correcting for detector smearing, however a full unfolding pipeline must also account for background, acceptance and efficiency effects. To fully integrate...
Measured distributions are usually distorted by a finite resolution of the detector. Within physics research, the necessary correction of these distortions is know as Unfolding. Machine learning research uses a different term for this very task: Quantification Learning. For the past two decades, this difference in terminology - together with several differences in notation - have prevented...
The High-Luminosity LHC era will deliver unprecedented data volumes, enabling measurements on fine-grained multidimensional histograms containing millions of bins with thousands of events each. Achieving ultimate precision requires modeling thousands of systematic uncertainty sources, creating computational challenges for likelihood minimization and parameter extraction. Fast minimization is...