15–18 Sept 2025
CEA Paris-Saclay
Europe/Paris timezone

Uncertainty Quantification in an ML Pattern Recognition Pipeline

17 Sept 2025, 15:30
30m
Amphithéâtre Claude Bloch (IPhT) (CEA Paris-Saclay)

Amphithéâtre Claude Bloch (IPhT)

CEA Paris-Saclay

Bât. 774 - Institut de Physique Théorique (IPhT), F-91190 Gif-sur-Yvette, France
Short-talk Deep Learning and Uncertainty Quantification Deep Learning and Uncertainty Quantification

Speaker

Lukas Péron

Description

Geometric learning pipelines have achieved state-of-the-art performance in High-Energy and Nuclear Physics reconstruction tasks like flavor tagging and particle tracking [1]. Starting from a point cloud of detector or particle-level measurements, a graph can be built where the measurements are nodes, and where the edges represent all possible physics relationships between the nodes. Depending on the size of the resulting input graph, a filtering stage may be needed to sparsify the graph connections. A Graph Neural Network will then build a latent representation of the input graph that can be used to predict, for example, whether two nodes (measurements) belong to the same particle or to classify a node as noise. The graph may then be partitioned into particle-level subgraphs, and a regression task used to infer the particle properties. Evaluating the uncertainty of the overall pipeline is important to measure and increase the statistical significance of the final result. How do we measure the uncertainty of the predictions of a multistep pattern recognition pipeline? How do we know which step of the pipeline contributes the most to the prediction uncertainty, and how do we distinguish between irreducible uncertainties arising from the aleatoric nature of our input data (detector noise, multiple scattering, etc) and epistemic uncertainties that we could reduce by using, for example, a larger model, or more training data?

We have developed an Uncertainty Quantification process for multistep pipelines to study these questions and applied it to the acorn particle tracking pipeline [2]. All our experiments are made using the TrackML open dataset [3]. Using the Monte Carlo Dropout method, we measure the data and model uncertainties of the pipeline steps, study how they propagate down the pipeline, and how they are impacted by the training dataset's size, the input data's geometry and physical properties. We will show that for our case study, as the training dataset grows, the overall uncertainty becomes dominated by aleatoric uncertainty, indicating that we had sufficient data to train the acorn model we chose to its full potential. We show that the ACORN pipeline yields high confidence in the track reconstruction and does not suffer from the miscalibration of the GNN model.

References:
[1] [2203.12852] Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges
[2] acorn - GNN4ITkTeam
[3] Data - TrackML particle tracking challenge

Author

Co-authors

Jay Chan (Lawrence Berkeley National Lab. (US)) Paolo Calafiura (Lawrence Berkeley National Lab. (US)) Xiangyang Ju (Lawrence Berkeley National Lab. (US))

Presentation materials