France-Berkeley PHYSTAT Conference on Unfolding

from -
Monday, June 10, 202411:30 AM RegistrationRegistration11:30 AM - 1:30 PM1:30 PM Welcome and Logistics - Lydia Brenner (Nikhef National institute for subatomic physics (NL)) Anja Butter (Centre National de la Recherche Scientifique (FR)) Ben Nachman (Lawrence Berkeley National Lab. (US))Welcome and Logistics
- Lydia Brenner (Nikhef National institute for subatomic physics (NL))
- Anja Butter (Centre National de la Recherche Scientifique (FR))
- Ben Nachman (Lawrence Berkeley National Lab. (US))

1:30 PM - 1:45 PM1:45 PM Statistics overview (30'+15') - Mikael Kuusela (Carnegie Mellon University (US))Statistics overview (30'+15')- Mikael Kuusela (Carnegie Mellon University (US))

1:45 PM - 2:30 PM2:30 PM HEP overview (30'+15') - Philippe Gras (Université Paris-Saclay (FR))HEP overview (30'+15')- Philippe Gras (Université Paris-Saclay (FR))

2:30 PM - 3:15 PM3:15 PM CoffeeCoffee3:15 PM - 3:45 PM3:45 PM ML overview (30'+15') - Tilman Plehn (Heidelberg University)ML overview (30'+15')- Tilman Plehn (Heidelberg University)

3:45 PM - 4:30 PM4:30 PM Unbinned Discriminative ML methods overview (20'+20') - Mariel Pettee (Lawrence Berkeley National Lab. (US))Unbinned Discriminative ML methods overview (20'+20')- Mariel Pettee (Lawrence Berkeley National Lab. (US))

4:30 PM - 5:10 PM5:10 PM Unbinned Generative ML methods overview (20'+20') - Nathan HuetschUnbinned Generative ML methods overview (20'+20')- Nathan Huetsch

5:10 PM - 5:50 PM5:50 PM Welcome Reception - Baker Street PubWelcome Reception - Baker Street Pub5:50 PM - 7:00 PM -
Tuesday, June 11, 20249:00 AM Binned ML methods overview (20'+20') - Jingjing Pan (Yale University (US))Binned ML methods overview (20'+20')
- Jingjing Pan (Yale University (US))

9:00 AM - 9:40 AM9:40 AM Performance / benchmarking with regularisation choice (20'+20') - Lydia Brenner (Nikhef National institute for subatomic physics (NL))Performance / benchmarking with regularisation choice (20'+20')- Lydia Brenner (Nikhef National institute for subatomic physics (NL))

9:40 AM - 10:20 AM10:20 AM CoffeeCoffee10:20 AM - 10:50 AM10:50 AM Simplified Template Cross Sections (STXS) (20+20) - Rahul Balasubramanian (Centre National de la Recherche Scientifique (FR))Simplified Template Cross Sections (STXS) (20+20)- Rahul Balasubramanian (Centre National de la Recherche Scientifique (FR))

10:50 AM - 11:30 AM11:30 AM Likelihood-based unfolding with the CMS Higgs combination tool (20+20) - Alessandro Tarabini (ETH Zurich (CH))Likelihood-based unfolding with the CMS Higgs combination tool (20+20)- Alessandro Tarabini (ETH Zurich (CH))

11:30 AM - 12:10 PMThe CMS Higgs combination tool is the software package used for statistical analyses by the CMS Collaboration. The package, originally designed to perform searches for a Higgs boson and the combined analysis of those searches, has evolved to become the statistical analysis tool presently used in the majority of measurements and searches performed by the CMS Collaboration. Since Combine has access to the full likelihood function, it can also be used to perform a likelihood-based unfolding. This approach has become the standard unfolding procedure for many analyses within the collaboration.12:10 PM LunchLunch12:10 PM - 1:30 PM1:30 PM Unfolding is not unsmearing (20+20) - Michael Schmelling (Max Planck Society (DE))Unfolding is not unsmearing (20+20)- Michael Schmelling (Max Planck Society (DE))

1:30 PM - 2:10 PMIn particle physics unfolding methods are employed when the basis used to represent an estimate of the truth is not the basis with statistically independent expansion coefficients. For the discrete unfolding problem the latter is given by the eigenvectors of the Fisher information matrix, which measures the amount of information carried by the data about the truth. In typical cases it is ill-conditioned, with the consequence that the measurements constrain only a small number of the expansion coefficients. This allows for highly efficient data reduction, but only for a biased estimate of the truth. Unfolding methods differ in how they bias the result.A way to quantify this is the posterior response matrix.2:10 PM Unfolding in direct BSM search-sensitive regions of phase space (20'+20') - Sarah Louise Williams (University of Cambridge (GB))Unfolding in direct BSM search-sensitive regions of phase space (20'+20')- Sarah Louise Williams (University of Cambridge (GB))

2:10 PM - 2:50 PM2:50 PM CoffeeCoffee2:50 PM - 3:20 PM3:20 PM Using unfolded data in global QCD analyses (20+20) - Oleksandr Zenaiev (Hamburg University)Using unfolded data in global QCD analyses (20+20)- Oleksandr Zenaiev (Hamburg University)

3:20 PM - 4:00 PM4:00 PM Unfolding in the context of a heavy ion analysis (20+20) - Molly Park (Massachusetts Inst. of Technology (US))Unfolding in the context of a heavy ion analysis (20+20)- Molly Park (Massachusetts Inst. of Technology (US))

4:00 PM - 4:40 PMIn this study, the process of unfolding is studied in the context of a heavy ion photon-tagged jet analysis. The SVD and D'Agostini unfolding algorithms are compared, and an application of using the MSE to choose the regularization strength is shown. Additionally, the investigation looks into the bias associated with unfolding in relation to prior choice. The performance is evaluated with different theoretical models and the bottom line test.4:40 PM Response Matrix Estimation in Unfolding Differential Cross Sections (20+20) - Richard Zhu (Carnegie Mellon University)Response Matrix Estimation in Unfolding Differential Cross Sections (20+20)- Richard Zhu (Carnegie Mellon University)

4:40 PM - 5:20 PMIn unfolding problem, the response matrix is the forward operator which models the detector response. In practice, the response matrix is not known analytically. Instead, it needs to be estimated using Monte Carlo simulation, which introduces statistical uncertainty into the unfolding procedure. This raises the question of how to estimate the response matrix in a sensible way. In most analyses at the LHC, this is done by binning the events and counting the corresponding numbers of events from bins to bins. However, this approach can suffer from undersmoothing, especially with a small sample size. To address this issue, we propose a two-step approach to response matrix estimation. First, we estimate the response kernel on the unbinned space. Second, we propagate the estimated response kernel into an integral equation to obtain an estimate for the response matrix. -
Wednesday, June 12, 20249:00 AM Dealing with Uncertainties (20+20) - Kyle Cormier (University of Zurich (CH))Dealing with Uncertainties (20+20)
- Kyle Cormier (University of Zurich (CH))

9:00 AM - 9:40 AM9:40 AM Unfolding in the context of g-2 (20+20) - Laurent Lellouch (CNRS and Aix-Marseille U.)Unfolding in the context of g-2 (20+20)- Laurent Lellouch (CNRS and Aix-Marseille U.)

9:40 AM - 10:20 AM10:20 AM CoffeeCoffee10:20 AM - 10:50 AM10:50 AM Profile likelihood unfolding with large number of bins (20+20) - David Walter (CERN)Profile likelihood unfolding with large number of bins (20+20)- David Walter (CERN)

10:50 AM - 11:30 AMIn this talk we will discuss previous measurements using binned maximum likelihood unfolding focusing on analyses with large numbers of bins and nuisance parameters. We will outline the technical implementation and highlight the challenges and limitations. Furthermore, we showcase the newly-developed approach of "linearized binned likelihood unfolding", a modified formalism that has a better scaling and allows to perform unfolding on even larger numbers of bins and nuisance parameters.11:30 AM Moment Unfolding (20+20) - Krish DesaiMoment Unfolding (20+20)- Krish Desai

11:30 AM - 12:10 PM12:10 PM LunchLunch12:10 PM - 1:30 PM1:30 PM1:30 PM - 5:30 PM6:50 PM Conference dinner - Ciel de ParisConference dinner - Ciel de Paris6:50 PM - 8:50 PM -
Thursday, June 13, 20249:30 AM QUnfold: Quantum Annealing for Distributions Unfolding in High-Energy Physics (20+20) - Simone Gasperini (Universita e INFN, Bologna (IT)) Gianluca Bianco (Universita e INFN, Bologna (IT))QUnfold: Quantum Annealing for Distributions Unfolding in High-Energy Physics (20+20)
- Simone Gasperini (Universita e INFN, Bologna (IT))
- Gianluca Bianco (Universita e INFN, Bologna (IT))

9:30 AM - 10:10 AMIn High-Energy Physics (HEP) experiments, each measurement apparatus exhibit a unique signature in terms of detection efficiency, resolution, and geometric acceptance. The overall effect is that the distribution of each observable measured in a given physical process could be smeared and biased. Unfolding is the statistical technique employed to correct for this distortion and restore the original distribution. This process is essential to make effective comparisons between the outcomes obtained from different experiments and the theoretical predictions. The emerging technology of Quantum Computing represents an enticing opportunity to enhance the unfolding performance and potentially yield more accurate results. This work introduces QUnfold, a simple Python module designed to address the unfolding challenge by harnessing the capabilities of quantum annealing. In particular, the regularized log-likelihood minimization formulation of the unfolding problem is translated to a Quantum Unconstrained Binary Optimization (QUBO) problem, solvable by using quantum annealing systems. The algorithm is validated on a simulated sample of particles collisions data generated combining the Madgraph Monte Carlo event generator and the Delphes simulation software to model the detector response. A variety of fundamental kinematic distributions are unfolded and the results are compared with conventional unfolding algorithms commonly adopted in precision measurements at the Large Hadron Collider (LHC) at CERN. The implementation of the quantum unfolding model relies on the D-Wave Ocean software and the algorithm is run by heuristic classical solvers as well as the physical D-Wave Advantage quantum annealer boasting 5000+ qubits.10:10 AM Unfolding using Denoising Diffusion (20+20) - Camila Pazos (Tufts University (US))Unfolding using Denoising Diffusion (20+20)- Camila Pazos (Tufts University (US))

10:10 AM - 10:50 AMUnfolding detector distortions in experimental data is critical for enabling precision measurements in high-energy physics (HEP). However, traditional unfolding methods face challenges in scalability, flexibility, and dependence on simulations. We introduce a novel unfolding approach using conditional denoising diffusion probabilistic models (cDDPM). By modeling the conditional probability density between detector-level observations and truth-level particle properties from various physics processes, the cDDPM unfolding performance generalizes across varied simulated processes and kinematic distributions without retraining. We demonstrate proof-of-concept on toy models and evaluate on simulated Large Hadron Collider jets across different physics processes.10:50 AM CoffeeCoffee10:50 AM - 11:20 AM11:20 AM Full Event Particle-Level Unfolding with Variable Length Latent Variational Diffusion (20+20) - Kevin Thomas Greif (University of California Irvine (US)) Alexander Shmakov (University of California Irvine (US))Full Event Particle-Level Unfolding with Variable Length Latent Variational Diffusion (20+20)- Kevin Thomas Greif (University of California Irvine (US))
- Alexander Shmakov (University of California Irvine (US))

11:20 AM - 12:00 PMCollisions at the Large Hadron Collider (LHC) provide information about the values of parameters in theories of fundamental physics. Extracting measurements of these parameters requires accounting for effects introduced by the particle detector used to observe the collisions. The typical approach is to use a high-fidelity simulation of the detector to generate synthetic datasets that can then be compared directly with experimental data. However, these simulations are often proprietary and computationally expensive. An alternative approach, *unfolding*, statistically adjusts the experimental data for detector effects. Traditional unfolding algorithms require binning data in a small set of pre-selected dimensions. Recent methods using generative machine learning models have shown promise for performing un-binned unfolding in high dimensions, allowing later computation of many observables. However, all current generative approaches are limited to unfolding a fixed set of observables, making them unable to perform *full-event* unfolding in the variable dimensional environment of collider data. A novel modification to the variational latent diffusion model (VLD) approach to generative unfolding is presented, which allows for unfolding of high- and variable-dimensional feature spaces. The performance of this method is evaluated in the context of semi-leptonic $t\bar{t}$ production at the LHC. Additionally, the dependence of the unfolding on the training data prior is assessed by evaluating the model on datasets with alternative priors.12:00 PM LunchLunch12:00 PM - 1:30 PM1:30 PM A 24-dimensional Cross-section Measurement with ATLAS (15+15) - Mariel Pettee (Lawrence Berkeley National Lab. (US))A 24-dimensional Cross-section Measurement with ATLAS (15+15)- Mariel Pettee (Lawrence Berkeley National Lab. (US))

1:30 PM - 2:00 PM2:00 PM Multidimensional cross-section measurements with H1 (15+15) - Fernando Torales Acosta (Lawrence Berkeley National Lab. (US))Multidimensional cross-section measurements with H1 (15+15)- Fernando Torales Acosta (Lawrence Berkeley National Lab. (US))

2:00 PM - 2:30 PM2:30 PM Open discussion on challenges and possible solutions in unfolding - Bogdan Malaescu (LPNHE-Paris CNRS/IN2P3 (FR))Open discussion on challenges and possible solutions in unfolding- Bogdan Malaescu (LPNHE-Paris CNRS/IN2P3 (FR))

2:30 PM - 3:00 PM3:00 PM Closing of the meeting and summary - Ben Nachman (Lawrence Berkeley National Lab. (US))Closing of the meeting and summary- Ben Nachman (Lawrence Berkeley National Lab. (US))

3:00 PM - 3:30 PM