France-Berkeley PHYSTAT Conference on Unfolding

Europe/Paris
Anja Butter (Centre National de la Recherche Scientifique (FR)), Ben Nachman (Lawrence Berkeley National Lab. (US)), Lydia Brenner (Nikhef National institute for subatomic physics (NL))
Description

A central task in differential cross section measurements in particle-, nuclear-, and astrophysics is unfolding: the removal of detector distortions, also called deblurring or deconvolution.  Unfolding is a challenging inverse, simulation-based inference task.  

The goal of this conference is to bring together method developers and practioners to discuss the state-of-the-art in unfolding.  One key aspect of the conference will be machine learning-based unfolding methods, which have enabled new possibilities (e.g. unbinned and high-dimensional measurements).

The conference will be held at the LPNHE in Paris from June 10 - 13, 2024. Please note that you have to access the campus at the exit of the metro station Jussieu.

Amphithéâtre Georges Charpak · 75005 Paris, France

There will be a zoom connection for remote participation as well.

Organizing Committee:

Olaf Behnke
Lydia Brenner
Anja Butter
Louis Lyons
Bogdan Malaescu
Ben Nachman
 

Acknowledgements: We are grateful to the France-Berkeley Fund for sponsorship and to PHYSTAT for logitistcal support.

   

Registration
Participation on-line (no fee)
Registration for in-person participation (€100 fee)
Participants
    • 11:30 13:30
      Registration 2h
    • 13:30 13:45
      Welcome and Logistics 15m
      Speakers: Anja Butter (Centre National de la Recherche Scientifique (FR)), Ben Nachman (Lawrence Berkeley National Lab. (US)), Lydia Brenner (Nikhef National institute for subatomic physics (NL))
    • 13:45 14:30
      Statistics overview (30'+15') 45m
      Speaker: Mikael Kuusela (Carnegie Mellon University (US))
    • 14:30 15:15
      HEP overview (30'+15') 45m
      Speaker: Philippe Gras (Université Paris-Saclay (FR))
    • 15:15 15:45
      Coffee 30m
    • 15:45 16:30
      ML overview (30'+15') 45m
      Speaker: Tilman Plehn (Heidelberg University)
    • 16:30 17:10
      Unbinned Discriminative ML methods overview (20'+20') 40m
      Speaker: Mariel Pettee (Lawrence Berkeley National Lab. (US))
    • 17:10 17:50
      Unbinned Generative ML methods overview (20'+20') 40m
      Speaker: Nathan Huetsch
    • 17:50 19:00
      Welcome Reception - Baker Street Pub 1h 10m
    • 09:00 09:40
      Binned ML methods overview (20'+20') 40m
      Speaker: Jingjing Pan (Yale University (US))
    • 09:40 10:20
      Performance / benchmarking with regularisation choice (20'+20') 40m
      Speaker: Lydia Brenner (Nikhef National institute for subatomic physics (NL))
    • 10:20 10:50
      Coffee 30m
    • 10:50 11:30
      Simplified Template Cross Sections (STXS) (20+20) 40m
      Speaker: Rahul Balasubramanian (Centre National de la Recherche Scientifique (FR))
    • 11:30 12:10
      Likelihood-based unfolding with the CMS Higgs combination tool (20+20) 40m

      The CMS Higgs combination tool is the software package used for statistical analyses by the CMS Collaboration. The package, originally designed to perform searches for a Higgs boson and the combined analysis of those searches, has evolved to become the statistical analysis tool presently used in the majority of measurements and searches performed by the CMS Collaboration. Since Combine has access to the full likelihood function, it can also be used to perform a likelihood-based unfolding. This approach has become the standard unfolding procedure for many analyses within the collaboration.

      Speaker: Alessandro Tarabini (ETH Zurich (CH))
    • 12:10 13:30
      Lunch 1h 20m
    • 13:30 14:10
      Unfolding is not unsmearing (20+20) 40m

      In particle physics unfolding methods are employed when the basis used to represent an estimate of the truth is not the basis with statistically independent expansion coefficients. For the discrete unfolding problem the latter is given by the eigenvectors of the Fisher information matrix, which measures the amount of information carried by the data about the truth. In typical cases it is ill-conditioned, with the consequence that the measurements constrain only a small number of the expansion coefficients. This allows for highly efficient data reduction, but only for a biased estimate of the truth. Unfolding methods differ in how they bias the result.A way to quantify this is the posterior response matrix.

      Speaker: Michael Schmelling (Max Planck Society (DE))
    • 14:10 14:50
      Unfolding in direct BSM search-sensitive regions of phase space (20'+20') 40m
      Speaker: Sarah Louise Williams (University of Cambridge (GB))
    • 14:50 15:20
      Coffee 30m
    • 15:20 16:00
      Using unfolded data in global QCD analyses (20+20) 40m
      Speaker: Oleksandr Zenaiev (Hamburg University)
    • 16:00 16:40
      Unfolding in the context of a heavy ion analysis (20+20) 40m

      In this study, the process of unfolding is studied in the context of a heavy ion photon-tagged jet analysis. The SVD and D'Agostini unfolding algorithms are compared, and an application of using the MSE to choose the regularization strength is shown. Additionally, the investigation looks into the bias associated with unfolding in relation to prior choice. The performance is evaluated with different theoretical models and the bottom line test.

      Speaker: Molly Park (Massachusetts Inst. of Technology (US))
    • 16:40 17:20
      Response Matrix Estimation in Unfolding Differential Cross Sections (20+20) 40m

      In unfolding problem, the response matrix is the forward operator which models the detector response. In practice, the response matrix is not known analytically. Instead, it needs to be estimated using Monte Carlo simulation, which introduces statistical uncertainty into the unfolding procedure. This raises the question of how to estimate the response matrix in a sensible way. In most analyses at the LHC, this is done by binning the events and counting the corresponding numbers of events from bins to bins. However, this approach can suffer from undersmoothing, especially with a small sample size. To address this issue, we propose a two-step approach to response matrix estimation. First, we estimate the response kernel on the unbinned space. Second, we propagate the estimated response kernel into an integral equation to obtain an estimate for the response matrix.

      Speaker: Richard Zhu (Carnegie Mellon University)
    • 09:00 09:40
      Dealing with Uncertainties (20+20) 40m
      Speaker: Kyle Cormier (University of Zurich (CH))
    • 09:40 10:20
      Unfolding in the context of g-2 (20+20) 40m
      Speaker: Laurent Lellouch (CNRS and Aix-Marseille U.)
    • 10:20 10:50
      Coffee 30m
    • 10:50 11:30
      Profile likelihood unfolding with large number of bins (20+20) 40m

      In this talk we will discuss previous measurements using binned maximum likelihood unfolding focusing on analyses with large numbers of bins and nuisance parameters. We will outline the technical implementation and highlight the challenges and limitations. Furthermore, we showcase the newly-developed approach of "linearized binned likelihood unfolding",
      a modified formalism that has a better scaling and allows to perform unfolding on even larger numbers of bins and nuisance parameters.

      Speaker: David Walter (CERN)
    • 11:30 12:10
      Moment Unfolding (20+20) 40m
      Speaker: Krish Desai
    • 12:10 13:30
      Lunch 1h 20m
    • 13:30 17:30
      Tutorials
      Conveners: Dr Carsten Burgard (Technische Universitaet Dortmund (DE)), Fernando Torales Acosta, Javier Mariño Villadamigo, Lydia Brenner (Nikhef National institute for subatomic physics (NL)), Nathan Hutsch, Vincent Alexander Croft (Nikhef National institute for subatomic physics (NL))
    • 18:50 20:50
      Conference dinner - Ciel de Paris 2h

      Restaurant Le Ciel de Paris
      Tour Maine Montparnasse
      56 ème étage
      33, avenue du Maine
      75015 Paris

      Accès restaurant par l'ascenseur "Le Ciel de Paris"

    • 09:30 10:10
      QUnfold: Quantum Annealing for Distributions Unfolding in High-Energy Physics (20+20) 40m

      In High-Energy Physics (HEP) experiments, each measurement apparatus exhibit a unique signature in terms of detection efficiency, resolution, and geometric acceptance. The overall effect is that the distribution of each observable measured in a given physical process could be smeared and biased. Unfolding is the statistical technique employed to correct for this distortion and restore the original distribution. This process is essential to make effective comparisons between the outcomes obtained from different experiments and the theoretical predictions.
      The emerging technology of Quantum Computing represents an enticing opportunity to enhance the unfolding performance and potentially yield more accurate results.
      This work introduces QUnfold, a simple Python module designed to address the unfolding challenge by harnessing the capabilities of quantum annealing. In particular, the regularized log-likelihood minimization formulation of the unfolding problem is translated to a Quantum Unconstrained Binary Optimization (QUBO) problem, solvable by using quantum annealing systems. The algorithm is validated on a simulated sample of particles collisions data generated combining the Madgraph Monte Carlo event generator and the Delphes simulation software to model the detector response. A variety of fundamental kinematic distributions are unfolded and the results are compared with conventional unfolding algorithms commonly adopted in precision measurements at the Large Hadron Collider (LHC) at CERN.
      The implementation of the quantum unfolding model relies on the D-Wave Ocean software and the algorithm is run by heuristic classical solvers as well as the physical D-Wave Advantage quantum annealer boasting 5000+ qubits.

      Speakers: Dr Gianluca Bianco (Universita e INFN, Bologna (IT)), Simone Gasperini (Universita e INFN, Bologna (IT))
    • 10:10 10:50
      Unfolding using Denoising Diffusion (20+20) 40m

      Unfolding detector distortions in experimental data is critical for enabling precision measurements in high-energy physics (HEP). However, traditional unfolding methods face challenges in scalability, flexibility, and dependence on simulations. We introduce a novel unfolding approach using conditional denoising diffusion probabilistic models (cDDPM). By modeling the conditional probability density between detector-level observations and truth-level particle properties from various physics processes, the cDDPM unfolding performance generalizes across varied simulated processes and kinematic distributions without retraining. We demonstrate proof-of-concept on toy models and evaluate on simulated Large Hadron Collider jets across different physics processes.

      Speaker: Camila Pazos (Tufts University (US))
    • 10:50 11:20
      Coffee 30m
    • 11:20 12:00
      Full Event Particle-Level Unfolding with Variable Length Latent Variational Diffusion (20+20) 40m

      Collisions at the Large Hadron Collider (LHC) provide information about the values of parameters in theories of fundamental physics. Extracting measurements of these parameters requires accounting for effects introduced by the particle detector used to observe the collisions. The typical approach is to use a high-fidelity simulation of the detector to generate synthetic datasets that can then be compared directly with experimental data. However, these simulations are often proprietary and computationally expensive. An alternative approach, unfolding, statistically adjusts the experimental data for detector effects. Traditional unfolding algorithms require binning data in a small set of pre-selected dimensions. Recent methods using generative machine learning models have shown promise for performing un-binned unfolding in high dimensions, allowing later computation of many observables. However, all current generative approaches are limited to unfolding a fixed set of observables, making them unable to perform full-event unfolding in the variable dimensional environment of collider data. A novel modification to the variational latent diffusion model (VLD) approach to generative unfolding is presented, which allows for unfolding of high- and variable-dimensional feature spaces. The performance of this method is evaluated in the context of semi-leptonic $t\bar{t}$ production at the LHC. Additionally, the dependence of the unfolding on the training data prior is assessed by evaluating the model on datasets with alternative priors.

      Speakers: Alexander Shmakov (University of California Irvine (US)), Kevin Thomas Greif (University of California Irvine (US))
    • 12:00 13:30
      Lunch 1h 30m
    • 13:30 14:00
      A 24-dimensional Cross-section Measurement with ATLAS (15+15) 30m
      Speaker: Mariel Pettee (Lawrence Berkeley National Lab. (US))
    • 14:00 14:30
      Multidimensional cross-section measurements with H1 (15+15) 30m
      Speaker: Fernando Torales Acosta (Lawrence Berkeley National Lab. (US))
    • 14:30 15:00
      Open discussion on challenges and possible solutions in unfolding 30m
      Speaker: Bogdan Malaescu (LPNHE-Paris CNRS/IN2P3 (FR))
    • 15:00 15:30
      Closing of the meeting and summary 30m
      Speaker: Ben Nachman (Lawrence Berkeley National Lab. (US))