PHYSTAT - Statistics meets ML

Name: PHYSTAT - Statistics meets ML
Start: 2024-09-09T02:00:00+01:00
End: 2024-09-12T18:00:00+01:00
Location: Imperial College London

9–12 Sept 2024

Imperial College London

Europe/London timezone

Session

Social

10 Sept 2024, 18:00

Lecture Theatre 2, Blackett Laboratory (Imperial College London)

Lecture Theatre 2, Blackett Laboratory

Imperial College London

There are no materials yet.

117. Uncertainty-aware machine learning for the LHC

Nina Elmer

10/09/2024, 18:00

Poster

Estimating uncertainties is a fundamental aspect in every physics problem, no measurements or calculations comes without uncertainties. Hence it is crucial to consider the effect of training a neural network to problems in physics. I will present our work on amplitude regression, using loop amplitudes from LHC processes, as an example to examine the impact of different uncertainties on the...

118. Generative models: their evaluation and their limitations

Samuele Grossi (Università degli studi di Genova & INFN sezione di Genova)

10/09/2024, 18:01

Poster

I will present and discuss several proposed metrics, based on integral probability measures, for the evaluation of generative models (and, more generally, for the comparison of different generators). Some of the metrics are particularly efficient to be computed in parallel, and show good performances. I will first compare the metrics on toy multivariate/multimodal distributions, and then focus...

119. Limits to classification performance by relating Kullback-Leibler divergence to Cohen’s Kappa

Stephen Watts

10/09/2024, 18:02

Poster

The performance of machine learning classification algorithms are evaluated by estimating metrics, often from the confusion matrix, using training data and cross-validation. However, these do not prove that the best possible performance has been achieved. Fundamental limits to error rates can be estimated using information distance measures. To this end, the confusion matrix has been...

120. Graph neural networks on the test bench in HEP applications

Emanuel Lorenz Pfeffer (KIT - Karlsruhe Institute of Technology (DE))

10/09/2024, 18:03

Data analyses in the high-energy particle physics (HEP) community more and more often exploit advanced multivariate methods to separate signal from background processes. In this talk, a maximally unbiased, in-depth comparison of the graph neural network (GNN) architecture, which is of increasing popularity in the HEP community, with the already well-established technology of fully connected...

122. Interpolated Likelihoods for Fast Reinterpretations

Tom Runting (Imperial College (GB))

10/09/2024, 18:04

Poster

We present a method to accelerate Effective Field Theory reinterpretations using interpolated likelihoods. By employing Radial Basis Functions for interpolation and Gaussian Processes to strategically select interpolation points, we show that we can reduce the computational burden while maintaining accuracy. We apply this in the context of the Combined Higgs Boson measurement at CMS, a complex...

123. Efficient machine learning for statistical hypothesis testing

Dr Marco Letizia

10/09/2024, 18:05

Poster

Traditional statistical methods are often not adequate to perform inclusive and signal-agnostic searches at modern collider experiments delivering large amounts of multivariate data. Machine learning provides a set of tools to enhance analyses in large scale regimes, but the adoption of these methodologies comes with new challenges, such as the lack of efficiency and robustness, and potential...

124. Integrating Explainable AI in Data Analyses of ATLAS Experiment at CERN

Joseph Carmignani (University of Liverpool (GB))

10/09/2024, 18:06

Poster

“The Multi-disciplinary Use Cases for Convergent Approaches to AI Explainability (MUCCA) project is pioneering efforts to enhance the transparency and interpretability of AI algorithms in complex scientific fields. This study focuses on the application of Explainable AI (XAI) in high-energy physics (HEP), utilising a range of machine learning (ML) methodologies, from classical boosted decision...

125. Proximal Nested Sampling with Data-Driven AI Priors

Henry Aldridge (UCL)

10/09/2024, 18:07

Poster

Bayesian model selection provides a powerful framework for objectively comparing models directly from observed data, without reference to ground truth data. However, Bayesian model selection requires the computation of the marginal likelihood (model evidence), which is computationally challenging, prohibiting its use in many high-dimensional Bayesian inverse problems. With Bayesian imaging...

126. Generative models of astrophysical fields with scattering transforms on the sphere

Matt Price

10/09/2024, 18:08

Poster

Scattering transforms are a new type of summary statistics recently developed for the study of highly non-Gaussian processes, which have been shown to be very promising for astrophysical studies. In particular, they allow one to build generative models of complex non-linear fields from a limited amount of data, and have also been used as the basis of new statistical component separation...

127. Advanced techniques for Simulation Based Inference in collider physics

Giovanni De Crescenzo (University of Heidelberg)

10/09/2024, 18:09

Poster

We present an application of Simulation-Based Inference (SBI) in collider physics, aiming to constrain anomalous interactions beyond the Standard Model (SM). This is achieved by leveraging Neural Networks to learn otherwise intractable likelihood ratios. We explore methods to incorporate the underlying physics structure into the likelihood estimation process. Specifically, we compare two...

128. SBI for wide field weak lensing

Kiyam Lin

10/09/2024, 18:10

Poster

The standard approach to inference from cosmic large-scale structure data employs summary statistics that are compared to analytic models in a Gaussian likelihood with pre-computed covariance. To overcome many of the idealising assumptions that go into this type of explicit likelihood inference, and to take advantage of the high-fidelity wide field data that Euclid and LSST will provide, we...

129. Exhaustive Symbolic Regression: Learning Astrophysics directly from Data

Harry Desmond (University of Portsmouth)

10/09/2024, 18:11

Poster

A key challenge in the field of AI is to make machine-assisted discovery interpretable, enabling it not only to uncover correlations but also to improve our physical understanding of the world. A nascent branch of machine learning – Symbolic Regression (SR) – aims to discover the optimal functional representations of datasets, producing perfectly interpretable outputs (equations) by...

130. Usage of weakly correlated observables for nuisance parameter fits

Lars Stietz (Hamburg University of Technology (DE))

10/09/2024, 18:12

Poster

Precision measurements at the Large Hadron Collider (LHC), such as the measurement of the top quark mass, are essential for advancing our understanding of fundamental particle physics. Profile likelihood fits have become the standard method to extract physical quantities and parameters from the measurements. These fits incorporate nuisance parameters to include systematic uncertainties. The...

131. Accounting for Selection Effects in Supernova Cosmology with Simulation-Based Inference and Hierarchical Bayesian Modelling

Benjamin Boyd (University of Cambridge)

10/09/2024, 18:13

Poster

Type Ia supernovae (SNe Ia) are thermonuclear exploding stars that can be used to put constraints on the nature of our universe. One challenge with population analyses of SNe Ia is Malmquist bias, where we preferentially observe the brighter SNe due to limitations of our telescopes. If untreated, this bias can propagate through to our posteriors on cosmological parameters. In this work, we...

132. COmoving Computer Acceleration (COCA): Correcting Emulation Errors for Trustworthy N-Body Simulations

Deaglan Bartlett (Institut d'Astrophysique de Paris)

10/09/2024, 18:14

Poster

Neural networks are increasingly used to emulate complex simulations due to their speed and efficiency. Unfortunately, many ML algorithms, including (deep) neural networks, lack interpretability. If machines predict something humans do not understand, how can we check (and trust) the results? Even if we could identify potential mistakes, current methods lack effective mechanisms to correct...

133. Application of Machine Learning Based Top Quark and Jet Tagging to Hadronic Four-Top Final States Induced by SM as well as BSM Processes

Monika Machalová

10/09/2024, 18:15

Poster

The aim of this work is to solve the problem of hadronic jet substructure recognition using classical subjettiness variables available in the parameterized detector simulation package, Delphes. Jets produced in simulated proton-proton collisions are identified as either originating from the decay of a top quark or a W boson and are used to reconstruct the mass of a hypothetical scalar...

134. Accelerating High-Dimensional Cosmological Inference with COSMOPOWER

Alessio Spurio Mancini (Royal Holloway, University of London)

10/09/2024, 18:16

Poster

A new generation of astronomical surveys, such as the recently launched European Space Agency’s Euclid mission, will soon deliver exquisite datasets with unparalleled amounts of cosmological information, poised to change our understanding of the Universe. However, analysing these datasets presents unprecedented statistical challenges. Multiple systematic effects need to be carefully accounted...

135. Learning Optimal and Interpretable Summary Statistics of Galaxy Catalogs with SBI

Kai Lehman (LMU Munich)

10/09/2024, 18:17

Poster

How much cosmological information can we reliably extract from existing and upcoming large-scale structure observations? Many summary statistics fall short in describing the non-Gaussian nature of the late-time Universe and modelling uncertainties from baryonic physics. Using simulation based inference (SBI) with automatic data-compression from graph neural networks, we learn optimal summary...

146. How to Unfold Top Decays

Sofia Palacios Schweitzer (Heidelberg), Tilman Plehn (Heidelberg University)

10/09/2024, 18:18

Poster

Many physics analyses at the LHC rely on algorithms to remove detector effect, commonly known as unfolding. Whereas classical methods only work with binned, one-dimensional data, Machine Learning promises to overcome both problems. Using a generative unfolding pipeline, we show how it can be build into an existing LHC analysis, designed to measure the top mass. We discuss the model-dependence...

137. Noise injection node regularization for robust learning

Noam Levi (Tel Aviv University)

10/09/2024, 18:19

Poster

We introduce Noise Injection Node Regularization (NINR), a method that injects structured noise into Deep Neural Networks (DNNs) during the training stage, resulting in an emergent regularizing effect. We present both theoretical and empirical evidence demonstrating substantial improvements in robustness against various test data perturbations for feed-forward DNNs trained under NINR. The...

138. Modeling Smooth Backgrounds at Collider Experiments With Log Gaussian Cox Processes

Yuval Yitzhak Frid (Tel Aviv University (IL))

10/09/2024, 18:20

Poster

Background modeling is one of the critical elements of searches for new physics at experiments at the Large Hadron Collider. In many searches, backgrounds are modeled using analytic functional forms. Finding an acceptable function can be complicated, inefficient and time-consuming. This poster presents a novel approach to estimating the underlying PDF of a 1D dataset of samples using Log...

144. Precision Machine Learning for the Matrix Element Method

Nathan Huetsch (Heidelberg), Tilman Plehn (Heidelberg University)

10/09/2024, 18:21

Poster

The matrix element method is the LHC inference method of choice for limited statistics, as it allows for optimal use of available information. We present a dedicated machine learning framework, based on efficient phase-space integration, a learned acceptance and transfer function. It is based on a choice of INN and diffusion networks, and a transformer to solve jet combinatorics. We showcase...

145. The Landscape of Unfolding with Machine Learning

Tilman Plehn (Heidelberg University), Xavier Marino (Heidelberg)

10/09/2024, 18:22

Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex...

105. Accelerating High-Dimensional Cosmological Inference with COSMOPOWER

Alessio Spurio Mancini (Department of Physics, Royal Holloway, University of London)

Poster

99. Accounting for Selection Effects in Supernova Cosmology with Simulation-Based Inference and Hierarchical Bayesian Modelling

Benjamin Boyd (University of Cambridge)

Poster

95. Advanced techniques for Simulation Based Inference in collider physics

Giovanni De Crescenzo (University of Heidelberg)

Poster

103. Amplitude interpolation with equivariant neural networks

Víctor Bresó Pla (University of Heidelberg)

Poster

We present a detailed comparison of multiple interpolation methods to characterize the amplitude distribution of several Higgs boson production modes at the LHC. Apart from standard interpolation techniques, we develop a new approach based on the use of the Lorentz Geometric Algebra Transformer (L-GATr). L-GATr is an equivariant neural network that is able to encode Lorentz and permutation...

101. Application of Machine Learning Based Top Quark and 𝑊 Jet Tagging to Hadronic Four-Top Final States Induced by SM as well as BSM Processes

Monika Machalová

Poster

136. Bayesian evidence estimation with normalizing flows

Rahul Srinivasan

Poster

Using floZ, an improved Bayesian evidence (and its numerical uncertainty) estimation method based on normalizing flows, we estimate the Bayes factor in favor of gravitational wave overtones in the ringdown of the first detection. We find good agreement with nested sampling. Provided representative samples from the target posterior are available, our method is more robust to posterior...

107. Bayesian evidence estimation with normalizing flows

Rahul Srinivasan

Poster

Using , an improved Bayesian evidence (and its numerical uncertainty) estimation method based on normalizing flows, we estimate the Bayes factor in favor of gravitational wave overtones in the ringdown of the first detection. We find good agreement with nested sampling. Provided representative samples from the target posterior are available, our method is more robust to posterior distributions...

100. COmoving Computer Acceleration (COCA): Correcting Emulation Errors for Trustworthy N-Body Simulations

Deaglan Bartlett (Institut d'Astrophysique de Paris)

Poster

90. Efficient machine learning for statistical hypothesis testing

Dr Marco Letizia

Poster

102. Emulation of Cosmological Observables under Model Misspecification in the Era of Petascale Cosmology

Markus Michael Rau

Poster

The modeling of cosmological observables becomes increasingly complex and we need to rely on computationally costly computer models for scalable inference. I will present a current project on advancing current emulation efforts to include functional input like selection functions into the emulation. In particular I will highlight opportunities to include Machine Learning models into the...

97. Exhaustive Symbolic Regression: Learning Astrophysics directly from Data

Harry Desmond (University of Portsmouth)

Poster

A key challenge in the field of AI is to make machine-assisted discovery interpretable, enabling it not only to uncover correlations but also to improve our physical understanding of the world. A nascent branch of machine learning -- Symbolic Regression (SR) -- aims to discover the optimal functional representations of datasets, producing perfectly interpretable outputs (equations) by...

94. Generative models of astrophysical fields with scattering transforms on the sphere

Matthew Price (Mullard Space Science Laboratory, University College London)

Poster

83. Generative models: their evaluation and their limitations

Samuele Grossi (Università degli studi di Genova & INFN sezione di Genova)

Poster

86. Graph neural networks on the test bench in HEP applications

Emanuel Lorenz Pfeffer (KIT - Karlsruhe Institute of Technology (DE))

Poster

89. How to Unfold Top Decays

Sofia Palacios Schweitzer (ITP, University Heidelberg)

Poster

92. Integrating Energy Flow Networks with Jet Substructure Observables for Enhanced Jet Quenching Studies

João A. Gonçalves (LIP - IST)

Poster

The phenomena of Jet Quenching, a key signature of the Quark-Gluon Plasma (QGP) formed in Heavy-Ion (HI) collisions, provides a window of insight into the properties of this primordial liquid. In this study, we rigorously evaluate the discriminating power of Energy Flow Networks (EFNs), enhanced with substructure observables, in distinguishing between jets stemming from proton-proton (pp) and...

91. Integrating Explainable AI in Modern High-Energy Physics (the MUCCA Project)

Joseph Carmignani (University of Liverpool (GB))

Poster

The Multi-disciplinary Use Cases for Convergent new Approaches to AI explainability (MUCCA) project is pioneering efforts to enhance the transparency and interpretability of AI algorithms in complex scientific endeavours. The presented study focuses on the role of Explainable AI (xAI) in the domain of high-energy physics (HEP). Approaches based on Machine Learning (ML) methodologies, from...

88. Interpolated Likelihoods for Fast Reinterpretations

Tom Runting (Imperial College (GB))

Poster

106. Learning Optimal and Interpretable Summary Statistics of Galaxy Catalogs with SBI

Kai Lehman (LMU Munich)

Poster

84. Limits to classification performance by relating Kullback-Leibler divergence to Cohen's Kappa

Stephen Watts

Poster

108. lsbi: linear simulation based inference

Dr William Handley

Poster

Simulation-based inference is undergoing a renaissance in statistics and machine learning. With several packages implementing the state-of-the-art in expressive AI [mackelab/sbi] [undark-lab/swyft], it is now being effectively applied to a wide range of problems in the physical sciences, biology, and beyond.

Given the rapid pace of AI/ML, there is little expectation that the implementations...

109. Noise injection node regularization for robust learning

Noam Levi (Tel Aviv University)

Poster

104. Non-standard boundary behaviour arising in binary mixture problems

Heather Battey (Imperial College London)

Poster

Consider a binary mixture model of the form , where is standard normal and is a completely specified heavy-tailed distribution with the same support. Gaussianity of reflects a reduction of the raw data to a set of pivotal test statistics at each site (e.g. an energy level in a particle physics context). For a sample of independent and identically distributed values , the maximum likelihood...

110. Poster 1

Indranil Das (Imperial College London (GB))

Poster

81. Poster 2

Jonathon Mark Langford (Imperial College (GB))

82. Poster 3

Jonathon Mark Langford (Imperial College (GB))

85. Precision-Machine Learning for the Matrix Element Method

Nathan Huetsch (Heidelberg University, ITP Heidelberg)

Poster

93. Proximal Nested Sampling with Data-Driven AI Priors

Henry Aldridge (UCL)