Machine learning has become a hot topic in particle physics over the past several years. In particular, there has been a lot of progress in the area of particle and event identification, reconstruction, fast simulation and others. One significant area of research and development has focused on jet physics. In this workshop, we will discuss current progress in this area, focusing on new breakthrough ideas and existing challenges. The ML4Jets workshop will be open to the full community and will include LHC experiments as well as theorists and phenomenologists interested in this topic. This year's workshop is hosted at the University of Heidelberg. It follows workshops in 2017, 2018 and 2020.
Find the Zoom rooms here.
Local Organizers:
Anja Butter
Barry Dillon
Ullrich Köthe
Tilman Plehn
Hans-Christian Schultz-Coulon
International Organization Committee:
Kyle Cranmer (NYU)
Ben Nachman (LBNL)
Maurizio Pierini (CERN)
Tilman Plehn (Heidelberg)
Jesse Thaler (MIT)
Download the full resolution PDF (7MB) here:
https://indico.cern.ch/event/980214/attachments/2153635/3632000/ML4jets_poster.pdf
Jets originating from bottom quarks, b-jets, are of particular interest in high energy physics. While b-jets are similar to other jets, they have certain qualities that present unique challenges in the context of machine learning. Generally, there is an underlying rotational symmetry of the particles about a jet’s axis. However, in the case of b-jets, some of the most discriminating observables, the transverse and longitudinal impact parameters, are not compatible with this symmetry. To address this, we instead consider the full 3-dimensional vector-valued impact parameter as the relevant observable for b-tagging. We then compare the b-tagging performance of a novel SO(3)-equivariant neural network against a standard baseline architecture of similar complexity. A preliminary study using DELPHES-simulated events with track smearing shows that the equivariant network significantly outperformed the baseline model. In particular, the equivariant model was less prone to over-training and improved background rejection overall and by over 1.5x at the 70% working point.
The Energy Flow Network (EFN) is a neural network architecture that represents jets as point clouds and enforces infrared and collinear (IRC) safety on its outputs. In this talk, I will introduce a new variant of the EFN architecture based on the Deep Sets formalism, incorporating permutation-equivariant layers. I will discuss the conditions under which IRC safety can be maintained in the new architecture and showcase the performance of these networks on the canonical example of W-boson tagging. The equivariant EFNs have similar performance to Particle Flow Networks, which are superior to standard EFNs. Finally I will comment on how the equivariant networks sculpt the jet mass compared to unaugmented EFNs.
One of the most ubiquitous challenges in analyses at the LHC is event reconstruction, whereby heavy resonance particles (such as top quarks, Higgs bosons, or vector bosons) must be reconstructed from the detector signatures left behind by their decay products. This is particularly challenging when all decay products have similar or identical signatures, such as all-jet events. Existing methods typically require the evaluation of every possible permutation of these events to find the "best" assignment. In this work, we present a novel neural network architecture ``SPANet'', or Symmetry Preserving Attention networks. By casting this problem as a set assignment problem on a variable size set, and embedding our knowledge of the symmetries in the problem into the neural network architecture, we demonstrate that these problems can be solved efficiently even in cases for which existing methods are intractable. We demonstrate the approach using a suite of progressively more complex benchmarks, going from all-hadronic ttbar, via ttH, to 4top final states, and provide an easy to use and flexible software package to design and train networks for arbitrary final states.
We introduce the Particle Convolution Network (PCN), a new type of equivariant neural network layer suitable for many tasks in jet physics. The particle convolution layer can be viewed as an extension of Deep Sets and Energy Flow network architectures, in which the permutation-invariant operator is promoted to a group convolution. While the PCN can be implemented for various kinds of symmetries, we consider the specific case of rotation in the $\eta - \phi$ plane. In two standard benchmark tasks, q/g tagging and top tagging, we show that the rotational PCN (rPCN) achieves performance comparable to graph networks such as ParticleNet. Moreover, we show that it is possible to implement an IRC-safe rPCN, which significantly outperforms existing IRC-safe tagging methods on both tasks. We speculate that by generalizing the PCN to include additional convolutional symmetries relevant to jet physics, it may outperform the current state-of-the-art set by graph networks, while allowing some control over physically-motivated inductive biases.
Optimal Transport has been applied to jet physics for the computation of distance between collider events. Here we generalize the Energy Mover’s Distance to include both the balanced Wasserstein-2 (W2) distance and the unbalanced Hellinger-Kantorovich (HK) distance. Whereas the W2 distance only allows for mass to be transported, the HK distance allows mass to be transported, created and destroyed, therefore naturally incorporating the total pt difference of the jets. Both distances enjoy a weak Riemannian structure and thus admit linear approximation. Such a linear framework significantly reduces the computational cost and in addition provides a Euclidean embedding amenable to simple machine learning algorithms and visualization techniques downstream. Here we demonstrate the benefit of this linear approach for jet classification and study its behavior in the presence of pileup.
Secondary vertex reconstruction is a key intermediate step in building powerful jet classifiers. We use a neural network to perform vertex finding inside jets in order to improve classification performance. This can be thought of as a supervised attention mechanism - directing the classifier towards the relevant information inside the jet. We show supervised attention outperforms an identical network with standard unsupervised attention.
https://arxiv.org/abs/2008.02831
In high energy heavy-ion collisions the substructure of jets is modified compared to that in proton-proton collisions due to the presence of the quark-gluon plasma (QGP). This modification of jets in the QGP is called ''jet quenching''. We employ machine learning techniques to quantify how much information about this process is within the substructure observables. We formulate the question as a binary classification problem where the machine is trained to learn information that distinguishes jets in proton-proton and heavy-ion collisions. We perform the classification task using $i)$ deep sets which directly includes Infrared-Collinear (IRC) unsafe information, and $ii)$ a complete basis of IRC safe jet substructure observables which is passed to a Dense Neural Network (DNN). From the trained DNN, we identify optimal observables using symbolic regression. We perform our analysis using parton shower event generator models, and outline possible future directions to apply these methods directly to the raw data. We expect that the automated design of observables for heavy-ion collisions can provide new guidance for inferring properties of QGP from jet substructure data. In addition, the proposed framework for jets can be extended to study event-wide properties of any nuclear collisions - in particular, to study electron-ion collisions at the future Electron-Ion Collider.
Jets of collimated particles originating from hard scattered partons are utilized in a wide range of analyses in high energy physics. Our study is focused on identifying jets originating from heavy quarks. We introduce a novel approach to tagging heavy-flavor jets at collider experiments utilizing the information contained within jet constituents via the JetVLAD model architecture. This model is based on the concept of Vectors of Locally Aggregated Descriptors, which takes a set of feature descriptors as an input and returns a fixed-length feature vector that characterizes each set. We show the performance of this model as characterized by common metrics and showcase its ability to extract high purity heavy-flavor jet sample at various realistic jet momenta and production cross-sections. The method was demonstrated on PYTHIA generated proton-proton collisions at center-of-mass energies 200 and 510 GeV.
TBC
Experiments at a future $e^{+}e^{-}$ collider will be able to search for new particles with masses below the nominal centre-of-mass energy by analyzing collisions with initial-state radiation (radiative return). We show that machine learning methods based on semisupervised and weakly supervised learning can achieve model-independent sensitivity to the production of new particles in radiative return events. In addition to a first application of these methods in $e^{+}e^{-}$ collisions, our study is the first to combine weak supervision with high-dimensional information by deploying a deep sets neural network architecture. We have also investigated some of the experimental aspects of anomaly detection in radiative return events and discuss these in the context of future detector design.
We propose Classifier-based Anomaly detection THrough Outer Density Estimation (CATHODE), a new approach to search for resonant new physics at the LHC in a model-agnostic way. In CATHODE, we train a conditional density estimator on additional features in the sideband region, interpolate it into the signal region, and sample from it. This produces in a data-driven way events that follow the SM background model without any reliance on simulation. Then we train a classifier to distinguish background events and data events in the signal region to find anomalies. Using the LHCO R&D dataset, we show that CATHODE can discover new physics that would otherwise be hiding in the data, improving the nominal statistical significance in a specific example from ~1$~\sigma$ to as much as ~15$~\sigma$.
We explore the robustness of the CATHODE (Classifier-based Anomaly detection THrough Outer Density Estimation) method against correlation in the input features. We also compare CATHODE to other related approaches, specifically ANODE and CWoLa Hunting. Using the LHCO R&D dataset, we will demonstrate that in the absence of feature correlations, CATHODE outperforms both ANODE and CWoLa Hunting, and even approaches the performance of a supervised classifier trained to distinguish data from background. Meanwhile, in the presence of feature correlations, CWoLa Hunting breaks down, while ANODE is robust. Here we demonstrate that CATHODE is also robust against correlations, maintaining its spectacular performance.
A unsupervised learning tool that searches for localized, overdense regions of the copula space of a multidimensional feature space is discussed. The algorithm, named RanBox, exists in two versions - one which searches multiple times in random subspaces (typically of 8 to 12 dimensions) of the feature space, and a second one (RanBoxIter) which iteratively adds dimensions to the searched space. Gradient descent is used to localize the multi-dimensional interval which maximizes a suitable test statistic proportional to the significance of the observed data in the box. Applications to UCI datasets from fundamental physics and from fraud detection are discussed.
Invertible Neural Networks (INNs) are an extremely versatile class of generative models. Their invertibility allows for exact modelling of proability densities, computation of information-theoretic quanities, interpretable and disentangled features, among other things. Due to these properties, INNs have seen growing adoption in recent years, especially in natural sciences and engineering disciplines. In this talk, we present a number of examples for successful applications of INN-specific methods to real-world problems, covering various scientific fields beyond particle physics.
As the use of Machine Learning techniques become more widespread within High Energy Physics it is important to consider how the results from Neural Networks can be applied within hypothesis testing. We show how a Log-Likelihood Ratio test can be performed using the the output of Neural Network classifiers trained on different physical datasets to yield a detection significance between two simple hypotheses, which we find to be superior to a common naive result. We also show how a generalised Log-Likelihood Ratio test can be performed using the output of a Variational Autoencoder when one hypothesis is not fully known beforehand, thereby providing a discovery significance that is useful in anomaly detection scenarios, which we showcase in the example of a Heavy Higgs EFT search.
For simulations where the forward and the inverse directions have a physics meaning, invertible neural networks are especially useful. A conditional INN can invert a detector simulation in terms of high-level observables, specifically for ZW production at the LHC. It allows for a per-event statistical interpretation. Next, we allow for a variable number of QCD jets. We unfold detector effects and QCD radiation to a pre-defined hard process, again with a per-event probabilistic interpretation over parton-level phase space.
We present the machine learning methodology that is the backbone of the new release of the NNPDF family of parton distribution functions. The new methodology introduces state of the art machine learning techniques such as stochastic gradient descent for neural network training which results in a major reduction in computational costs, and an automated optimization of the hyperparameters which reduces a source of bias. We further show how correlations between PDF sets can be used to assess the efficiency of the methodology, and why the use of correlations for the combination of different PDFs into a joint set could lead to severely distorted results. We discuss the "future test", a recently developed method of validating the generalization power of the methodology, which checks whether the uncertainty on PDFs, in regions in which they are not constrained by current data, are compatible with future data.
A central challenge in jet physics is that the evolution of the jet is an unobserved, latent process. In a semi-classical parton shower, this corresponds to a sequence of 1-to-2 splittings that form a tree-like showering history. Framing jet physics in probabilistic terms is attractive as it provides a principled framework to think about tasks as diverse as clustering, classification, parton shower tuning, matrix element — parton shower matching, and event generation of jets in complex, signal-like regions of phase space. Unfortunately, this usually involves either marginalizing (summing) or maximizing (searching) over the enormous space of clustering histories, which is typically intractable. We review three recently published works that address these challenges by building on techniques from statistics, machine learning, and combinatorial optimization. Each of these works are enabled by Ginkgo, a simplified, generative model for jets, designed to facilitate research in this area. We show how probabilistic programming can be used to efficiently sample the showering process, how a novel trellis algorithm can be used to efficiently marginalize over the enormous number of showering histories for the same observed particles, and how dynamic programming, A* search, and reinforcement learning can be used to find the maximum likelihood clustering in this enormous search space. (edited)
Tuning parton shower models to data is an important task for HEP experiments. We are performing exploratory research for what tuning the parton shower might look like if the parton shower were described by a generative model with a tractable likelihood, which might be implemented with a hybrid of theoretically-motivated components or generic neural network components. For this work we consider the Ginkgo model, which is a simplified parton shower with 1-to-2 splittings and a tractable likelihood that has been designed to facilitate this research. While the parton shower is traditionally tuned by matching one dimensional projections for various observables, ideally we would tune the it with a maximum likelihood fit. The challenge is that the likelihood for the data given the shower parameters must marginalize over the (2N-3)!! possible showering histories, where N is the number of jet constituents. We demonstrate that with the hierarchical cluster trellis we can exactly marginalize over this enormous space of showering histories and fit the parameters of the Ginkgo model.
In the last several years, the ML4Jets community has worked to improve performance for jet tagging and performed a number of comparisons of different architectures for jet tagging and other tasks. We have seen that combining multiple classifiers together into a meta-tagger or an ensemble improves performance. But is there still room for improvement? In other words, are we approaching the performance of the optimal tagger? Formally, the optimal classifier is defined by a likelihood ratio (Neyman-Pearson lemma), but the likelihood for the observed jet is typically intractable as it involves marginalizing over the enormous number of showering histories. Additionally, the likelihood for a particular shower is, in general, not easily accessible. We consider new datasets with signal and background generated with the Ginkgo model and use the cluster trellis to exactly compute the marginal likelihood under each hypothesis in order to calculate the exact optimal likelihood ratio. As a result, we can compare the performance of ML-based taggers to this optimal classifier.
We study unbinned multivariate analysis techniques, based on Statistical Learning, for indirect new physics searches at the LHC in the Effective Field Theory framework. We focus in particular on high-energy ZW production with fully leptonic decays, modeled at different degrees of refinement up to NLO in QCD. We show that a considerable gain in sensitivity is possible compared with current projections based on binned analyses. As expected, the gain is particularly significant for those operators that display a complex pattern of interference with the Standard Model amplitude. The most effective method is found to be the “Quadratic Classifier” approach, an improvement of the standard Statistical Learning classifier where the quadratic dependence of the differential cross section on the EFT Wilson coefficients is built-in and incorporated in the loss function. We argue that the Quadratic Classifier performances are nearly statistically optimal, based on a rigorous notion of optimality that we can establish for an approximate analytic description of the ZW process.
QCD splittings are among the most fundamental theory concepts at the LHC. In this talk, I will show how they can be studied systematically with the help of invertible neural networks. These networks work with sub-jet information to extract fundamental parameters from jet samples. Our approach expands the LEP measurements of QCD Casimirs to a systematic test of QCD properties based on low-level jet observables. Starting with a toy example, I will present the effect of the full shower, hadronization, and detector effects.
Recently, jet measurements in DIS events close to Born kinematics have been proposed as a new probe to study transverse-momentum-dependent (TMD) PDFs, TMD fragmentation functions, and TMD evolution. We report measurements of lepton-jet momentum imbalance and hadron-in-jet correlations in high-$Q^2$ DIS events collected with the H1 detector at HERA. The jets are reconstructed with the kT algorithm in the laboratory frame. These are two examples of a new type of TMD studies in DIS, which will serve as pathfinder for the Electron-Ion Collider program.
Measurements at colliders are often done by fitting data to simulations, which depend on many physical and unphysical parameters. One example is the top-quark mass, where parameters in simulation must be profiled when fitting the top-quark mass parameter. In particular, the dependence of top-quark mass fits on simulation parameters contributes to the error in the best measurements of the top-quark mass. In this talk, I discuss a simple new fitting method to reduce this error, where regression is done directly on ensembles of events. This method is superior at reducing the top-quark mass uncertainty when compared to both traditional histogram fitting methods as well as the modern ML DCTR method. More generally, machine learning from ensembles for parameter estimation has broad potential for collider physics measurements.
The Energy Movers Distance was recently proposed as an advantageous metric to distinguish certain types of signals at the LHC. We explore generalizations of this distance to multiple families of signals and find similar performance anomaly detection through variational autoencoders. We investigate this connection by exploring the correlation of event distances with distances in the latent space of the autoencoder.
We devise an autoencoder based strategy to facilitate anomaly detection for boosted jets, employ-
ing Graph Neural Networks (GNNs) to do so. To overcome known limitations of GNN autoencoders,
we design a symmetric decoder capable of simultaneously reconstructing edge features and node fea-
tures. Focusing on latent space based discriminators, we find that such setups provide a promising
avenue to isolate new physics and competing SM signatures from sensitivity-limiting QCD jet con-
tributions. We demonstrate the flexibility and broad applicability of this approach using examples
of W bosons, top quarks, and exotic hadronically-decaying exotic scalar boson
We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenge aims at detecting signals of new physics at the LHC using unsupervised learning algorithms. We define and describe a large benchmark dataset, consisting of > 1 Billion simulated LHC events. We then review a wide range of anomaly detection algorithms and measure their performance on the data challenge. We then assess the best-performing models on a still blinded dataset. Similarities between the best-performing models are observed and discussed.
We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenge aims at detecting signals of new physics at the LHC using unsupervised learning algorithms. We define and describe a large benchmark dataset, consisting of > 1 Billion simulated LHC events. We then review a wide range of anomaly detection algorithms and measure their performance on the data challenge. We then assess the best-performing models on a still blinded dataset. Similarities between the best-performing models are observed and discussed.
We show how an anomaly detection algorithm could be integrated in a typical search for new physics in events with jets at the CERN Large Hadron Collider (LHC). We assume that an anomaly detection algorithm is given, trained to identify rare jet types, such as jets originating from the decay of a highly boosted massive particle. We demonstrate how this algorithm could be integrated in a search without disrupting the background-estimate strategy while enhancing the sensitivity to new physics. As an example, we consider convolutional variational autoencoders (VAEs) applied to dijet events. The proposed procedure can be generalized to any final state with jets. Once applied to real data, it could contribute to extend the sensitivity of the LHC experiments to previously uncovered new physics scenarios, e.g. broad-resonance and non-resonant jet production from new physics processes.
Autoencoders have been introduced in high energy physics as a promising tool for model-independent new physics searches. As a benchmark scenario, we study the tagging of top jet images in a background of QCD jet images. Although we reproduce the positive results from the literature, we show that the standard autoencoder setup cannot be considered as a model-independent anomaly tagger by inverting the task: the autoencoder fails to tag QCD jets if it is trained on top jets. We suggest improved performance measures for the task of model-independent anomaly detection. We also improve the capability of the autoencoder to learn non-trivial features of the jet images, such that it is able to achieve both top jet tagging and QCD jet tagging with the same setup. However, we want to stress that a truly model-independent and powerful autoencoder-based unsupervised jet tagger still needs to be developed.
Models with dark showers represent one of the most challenging possibilities for new physics at the LHC. One of the most difficult examples is a novel collider signature called a Soft Unclustered Energy Pattern (SUEP), which can arise in certain BSM models with a hidden valley sector that is both pseudo-conformal and strongly coupled over a large range of energy scales. Large-angle emissions are unsuppressed during the showering process, and if the hidden sector hadrons decay hadronically and promptly back into the Standard Model, the result is a high-multiplicity shower of SM final state particles that possess more democratically distributed energies and a much higher degree of isotropy than typically seen in QCD jets. This signature presents significant challenges to model, trigger on and search for, due to high theoretical uncertainties and the lack of isolated hard objects to identify in the detector. We outline an analysis strategy to look for SUEP produced by exotic decays of the Higgs boson, using conventional cuts on event-level observables and employing an autoencoder neural network trained on QCD background as an anomaly detector. We compare this unsupervised approach to a simple cut-and-count strategy as well as supervised machine learning models. We find that our strategy could allow the HL-LHC to exclude branching ratios of Higgs decay to SUEP down to a few percent.
As an alternative approach (w.r.t. deep generative models) for detecting out-of-distribution samples, we explore the possibility of employing jet classifiers as anomalous jet taggers. We also discuss the advantages and limitations of different approaches.
Deep generative models are becoming widely used across science and industry for a variety of purposes. A common challenge is achieving a precise implicit or explicit representation of the data probability density. Recent proposals have suggested using classifier weights to refine the learned density of deep generative models. We extend this idea to all types of generative models and show how latent space refinement via iterated generative modeling can circumvent topological obstructions and improve precision. This methodology also applies to cases were the target model is non-differentiable and has many internal latent dimensions which must be marginalized over before refinement. We demonstrate our Latent Space Refinement (LaSeR) protocol on a variety of examples, focusing on the combinations of Normalizing Flows and Generative Adversarial Networks.
Data compression plays a major role in the field of Machine Learning and recent works based on generative models such as Generative Adversarial Networks (GANs) have shown that deep-learning-based compression can outperform state-of-the-art classical compression methodologies. Such techniques can be adapted and applied to various areas in high energy physics, in particular to the study of the Parton Distribution Functions (PDFs) in which large Monte Carlo replicas samples are required in order to get accurate results. In this talk, we present a compression algorithm for parton densities in which the statistics of a given input PDF set is further enhanced by the generation of synthetic replicas using GAN prior to compression. This results in a compression methodology that is able to provide a compressed set with smaller number of replicas and a more adequate representation of the original probability distribution.
Due to the expected increase in LHC data from the HL upgrade it is important to work on the efficiency of MC Event Generators in order to make theoretical predictions with the necessary precision accessible. One part of the calculation that could benefit from improvements is the generation of unweighted parton-level events. While adaptive multi-channel importance sampling combined with the Vegas algorithm is a very effective method for a wide range of scattering processes, it can become inefficient for challenging examples. Normalizing Flows are a recent machine learning development based on neural networks that provide trainable bijective mappings. We propose to use Normalizing Flows as a direct replacement for Vegas. The method guarantees full phase space coverage and the exact reproduction of the desired target distribution. We study the performance of the algorithm for a few representative examples, including top-quark pair production and gluon scattering into three- and four-gluon final states. We show that our method is able to achieve higher sampling performance than the traditional method for the simpler examples. Furthermore, we discuss the computational challenges and propose possible improvements that could boost the performance of the method also for more complex examples.
Symmetries are ubiquitous and essential in physics, and the framework to describe symmetries is group theory. The symmetry described by the Lorentz group is essential in the dynamics of all particle physics experiments. A Lorentz-group-equivariant deep neural network framework, called the Lorentz group network (LGN), has been introduced by Bogatskiy et al. and tested for performance in classifying jets. The model uses irreducible representations of the Lorentz group to achieve equivariance with respect to Lorentz transformations. However, the architecture has not yet been extended to generative, compression, or anomaly detection tasks yet. We develop an autoencoder based on the architecture of the LGN for jet compression and reconstruction tasks, using a complex permutation invariant loss function. The model is tested to be fully equivariant (within numerical precision) and is trained on a dataset of high momentum jets simulated at the LHC. We analyze the latent space after training, and explore how the choices of hyperparameters, such as the multiplicities of scalars and vectors in the latent spaces and the number of basis functions for the edge features, can influence the model’s performance.
Ensemble learning is a technique where multiple component learners are combined through a protocol. In this talk, we will present an Ensemble Neural Network (ENN) that uses the combined latent-feature space of multiple neural network classifiers to improve the representation of the network hypothesis. We apply this approach to construct an ENN from Convolutional and Recurrent Neural Networks to discriminate top-quark jets from QCD jets. Such ENN provides the flexibility to improve the classification beyond simple prediction combining methods by linking different sources of error correlations, hence improving the representation between data and hypothesis. In combination with Bayesian techniques, we show that it can reduce epistemic uncertainties and the entropy of the hypothesis by simultaneously exploiting various kinematic correlations of the system, which also makes the network less susceptible to a limitation in training sample size.
Graph neural networks (GNNs) have shown a lot of potential for jet tagging. Recent GNN algorithms such as ParticleNet, ABCNet, and LundNet represent the state-of-the-art in various jet tagging tasks. In this talk, we present some new progress on GNN design for jet tagging. With the incorporation of edge features and optimized network architecture, the new algorithm achieves a significant performance improvement on the top tagging benchmark.
The identification of boosted heavy particles such as top quarks or vector bosons is one of the key problems arising in experimental studies at the Large Hadron Collider. In this article, we introduce LundNet, a novel jet tagging method which relies on graph neural networks and an efficient description of the radiation patterns within a jet to optimally disentangle signatures of boosted objects from background events. We apply this framework to a number of different benchmarks, showing significantly improved performance for top tagging compared to existing state-of-the-art algorithms. We study the robustness of the LundNet taggers to non-perturbative and detector effects, and show how kinematic cuts in the Lund plane can mitigate overfitting of the neural network to model-dependent contributions. Finally, we consider the computational complexity of this method and its scaling as a function of kinematic Lund plane cuts, showing an order of magnitude improvement in speed over previous graph-based taggers.
In this talk we will present a a procedure to separate boosted Higgs bosons decaying into hadrons, from the background due to strong interactions. We employ the Lund jet plane to obtain a theoretically well-motivated representation of the jets of interest and we use the resulting images as the input to a convolutional neural network. In particular, we consider two different decay modes of the Higgs boson, namely into a pair of bottom quarks or into light jets, against the respective backgrounds. The performance of the tagger is compared to what is achieved using a traditional single-variable analysis which exploits a QCD inspired color-singlet tagger, namely the jet color ring observable. Furthermore, we study the dependence of the tagger's performance on the requirement that the invariant mass of the selected jets should be close to the Higgs mass.
With the great promise of deep learning, discoveries of new particles at the Large Hadron Collider (LHC) may be imminent. Following the discovery of a new Beyond the Standard model particle in an all-hadronic channel, deep learning can also be used to identify its quantum numbers. Convolutional neural networks (CNNs) using jet-images can significantly improve upon existing techniques to identify the quantum chromodynamic (QCD) (`color') as well as the spin of a two-prong resonance using its substructure. Additionally, jet-images are useful in determining what information in the jet radiation pattern is useful for classification, which could inspire future taggers. These techniques improve the categorization of new particles and are an important addition to the growing jet substructure toolkit, for searches and measurements at the LHC now and in the future.
We introduce a morphological analysis based on a neural network analyzing the Minkowski Functionals (MFs) of pixellated jet images. The MFs describe the geometric measures of binary images, and their changes by dilation encode the jet constituents' geometric structures that appear at various angular scales. We explicitly show that this morphological analysis can be considered a constrained convolutional neural network (CNN). Conversely, CNN could model the MFs, and we show their correlation in the example of tagging semi-visible jets emerging from the strong interaction of a hidden valley scenario. The MFs are independent of the IRC-safe observables commonly used in jet physics. We combine this morphological analysis with an IRC-safe relation network, which models two-point energy correlations. While the resulting network uses constrained input parameters, it shows comparable dark jet and top jet tagging performances to the CNN. The architecture has a significant advantage when the available data is limited, and we show that its tagging performance is much better than that of the CNN with a small number of training samples. We also qualitatively discuss their parton-shower model dependency. The results suggest that the MFs can be an efficient parameterization of the IRC-unsafe feature space of jets.
Identification of hadronic decays of highly Lorentz-boosted W/Z/Higgs bosons and top quarks provides powerful handles to a wide range of new physics searches and Standard Model measurements at the LHC. This talk presents recent advances in boosted jet tagging algorithms in CMS. The application of novel machine-learning techniques has substantially improved the tagging performance and led to a significant increase in the physics reach.
We train a Convolutional Neural Network to classify longitudinally and transversely polarized hadronic $W^\pm$ using the images of boosted $W^{\pm}$ jets as input. The images capture angular and energy information from the jet constituents that is faithful to the properties of the original quark/anti-quark $W^{\pm}$ decay products without the need for invasive substructure cuts. We find that the difference between the polarizations is too subtle for the network to be used as an event-by-event tagger. However, given an ensemble of $W^{\pm}$ events with unknown polarization, the average network output from that ensemble can be used to extract the longitudinal fraction $f_L$. We test the network on Standard Model $pp \to W^{\pm}Z$ events and on $pp \to W^{\pm}Z$ in the presence of dimension-6 operators that perturb the polarization composition.
It is widely known that predictions for jet substructure features vary significantly between Monte Carlo generators. This is especially true for the output of deep neural networks (NN) trained with high-dimensional feature spaces to tag the origin of a jet. However, even though the spectra of a given NN varies between generators, it could be that the function learned by different generators is the same. We investigate the universality of jet substructure information by training a NN with a variety of generators and testing these NNs on the same generator. By fixing the testing generator, we can see if the NNs have learned to use the same information, even if the extent to which that information is expressed varied between training datasets. Our target physics process is boosted Higgs bosons and we explore the implications of universality on uncertainties for searches for new particles at the Large Hadron Collider and beyond.
In High Energy Physics experiments Particle Flow (PFlow) algorithms are designed to provide an optimal reconstruction of the nature and kinematic properties of the particles produced within the detector acceptance during collisions. At the heart of PFlow algorithms is the ability to distinguish the calorimeter energy deposits of neutral particles from those of charged particles, using the complementary measurements of charged particle tracking devices, to provide a superior measurement of the particle content and kinematics. In this presentation, a computer vision approach to this fundamental aspect of PFlow algorithms, based on calorimeter images, is proposed. A comparative study of the state of the art deep learning techniques is performed. A significantly improved reconstruction of the neutral particle calorimeter energy deposits is obtained in a context of large overlaps with the deposits from charged particles. Calorimeter images with augmented finer granularity are also obtained using super-resolution techniques.
Reconstructing the jet transverse momentum ($p_{\rm T}$)is a challenging task, particularly in heavy-ion collisions due to the large fluctuating background from the underlying event. In the recent years, ALICE has developed a novel method to correct jets for this large background using machine learning techniques. This analysis intentionally does not utilize deep learning methods and instead utilizes a shallow neural network for simplicity. This approach uses jet properties, including the constituents of the jet, to create a mapping between the corrected and uncorrected jet $p_{\rm T}$ In comparison to the standard ALICE method, this machine learning based estimator demonstrates a significantly improved performance. The ML estimator was then applied to data in order to perform a measurement of full jets (jets containing both charged and neutral constituents) to lower transverse momenta than previously possible in ALICE. An ongoing challenge facing ML for jet physics is the interpretation of results and understanding potential biases. In this particular result, including constituent information in training introduces a bias towards PYTHIA-like fragmentation patterns, which has been shown to differ from the fragmentation measured in Pb--Pb data. Recent studies focusing on this bias in an attempt to further investigate and quantify its impact will be shown.
A common problem that appears in collider physics is the inference of a random variable $Y$ given a measurement of another random variable $X$, and the estimation of the uncertainty on $Y$. Additionally, one would like to quantify the extent to which $X$ and $Y$ are related. We present a machine learning framework for performing frequentist maximum likelihood inference with uncertainty estimation and measuring the mutual information between random variables. By using the Donsker-Varadhan representation of the KL divergence, the framework learns the likelihood ratio $p(x|y)/p(x)$. This can be used to calculate the mutual information between $X$ and $Y$. The framework is parameterized using a Gaussian ansatz, which enables a manifest extraction of the maximum likelihood values and uncertainties. All of this can be accomplished in a single training of the model. We then demonstrate our framework for a simple Gaussian example, apply it to a realistic calibration task by calculating jet energy correction (JEC) and jet energy resolution (JER) factors for CMS open data.
Advanced machine-learning techniques started recently to be explored by the CMS collaboration in various areas of jet physics, beyond jet classification. We present the most recent developments for the jet energy calibration and the jet mass reconstruction. In both cases novel algorithms using state-of-the-art machine-learning techniques have been developed. Significant improvement compared to traditional methods is observed which translates to improved physics reach.
We introduce CaloFlow, a fast detector simulation framework based on normalizing flows. For the first time, we demonstrate that normalizing flows can reproduce high-granularity calorimeter simulations with extremely high fidelity, providing a fresh alternative to computationally expensive GEANT4 simulations, as well as other state-of-the-art fast simulation frameworks based on GANs and VAEs. Besides the usual histograms of physical features and images of calorimeter showers, we introduce a new metric for judging the quality of generative modeling: the performance of a classifier trained to differentiate real from generated images. We show that GAN-generated images can be identified by the classifier with 100% accuracy, while images generated from CaloFlow are able to fool the classifier much of the time. More broadly, normalizing flows offer several advantages compared to other state-of-the-art approaches (GANs and VAEs), including: tractable likelihoods; stable and convergent training; and principled model selection. Normalizing flows also provide a bijective mapping between data and the latent space, which could have other applications beyond simulation, for example, to detector unfolding.
The extensive physics program of HEP experiments relies on simulated Monte Carlo events. This simulation provides a highly detailed detector response modeling. However, this simulation dominated by the calorimeter showers becomes very slow in the context of high luminosity LHC. Collecting order of magnitude more data remains necessary to lower the statistical uncertainties. Several research directions investigated the use of Machine Learning (ML) based models to for fast simulation of one specific detector. Each models tries to mimic the subtle and complex detector response resulting in a very finely tuned simulation. In this study, we explore the use of a ML multi-detector geometry model for fast simulation.
Modern high energy physics crucially relies on simulation to connect experimental observations to underlying theory. While traditional methods relying on Monte Carlo techniques produce powerful simulation tools, they prove to be computationally expensive. This is particularly true when they are applied to calorimeter shower simulation, where many particle interactions occur. The strain on computing resources due to simulation is projected to be so large as to be a major bottleneck at the high luminosity stage of the LHC and for future colliders.
Deep generative models have attracted significant attention as an approach which promises to drastically reduce the computing time required for simulation. Recent work in our group has demonstrated the capability of various generative models to accurately reproduce showers displaying key physics properties in a highly granular calorimeter. This initial work focused on the specific case of a particle incident perpendicular to the calorimeter face, however a practical simulator must incorporate arbitrary angles of incidence and simulate them correctly. This talk will address ongoing efforts to add conditioning on the incident angle of the particle. In particular, we demonstrate the crucial importance of modifying an existing loss function via the addition of an auxiliary constrainer network, in order to improve the angular performance of a generator.
Generative machine learning models are a promising way to efficiently amplify classical Monte Carlo generators' statistics for event simulation and generation in particle physics. The high computational cost of the simulation and the expected increase in data in the high-precision era of the LHC and at future colliders indicate that we urgently need such fast surrogate simulators. We present a status update on simulating particle showers in high granularity calorimeters for future colliders. Building on prior work using Generative Adversarial Networks (GANs), Wasserstein-GANs, and the information-theoretically motivated Bounded Information Bottleneck Autoencoder (BIB-AE), we achieve further improvements of the fidelity of generated photon showers. The key to this improvement is a detailed understanding and optimization of the latent space. The richer structure of hadronic showers compared to electromagnetic ones makes their precise modeling an important yet challenging problem. We present initial progress towards accurately simulating the core of hadronic showers in a highly granular scintillator calorimeter
AtlFast3 is the next generation of high precision fast simulation in ATLAS that is being deployed by the collaboration and will replace AtlFastII, the fast simulation tool that was successfully used until now. AtlFast3 combines a parametrization-based Fast Calorimeter Simulation and a new machine-learning based Fast Calorimeter Simulation based on Generative Adversarial Networks (GANs). The new fast simulation improves the accuracy of simulating objects used in analyses when compared to Geant4, with a focus on those that were poorly modelled in AtlFastII. In particular, the simulation of jets of particles reconstructed with large radii and the detailed description of their substructure, are significantly improved in Atlfast3. Additionally the agreement between AtlFast3 and Geant4 is improved for high momentum $\tau$-leptons. The modelling and performance are evaluated on events produced at 13 TeV centre-of-mass energy in the Run-2 data-taking conditions.
At the LHC, each bunch crossings is able to create thousands of particles per collisions. Identifying a collision of interest from additional “pileup” collisions is a difficult task, requiring the development of dedicated methods. Commonly used methods are however not scalable to future LHC upgrades, where the average number of interactions will increase by almost an order of magnitude. To tackle this challenge, machine learning methods for pileup mitigation are currently been developed to improve and replace standard algorithms. In this talk, an overview of pileup mitigation methods using machine learning in CMS are described.
We apply object detection techniques based on convolutional blocks to jet reconstruction and identification at the CERN Large Hadron Collider. We use particles reconstructed through a Particle Flow algorithm to represent each event as an image composed of a calorimeter and tracker cells as input and a Single Shot Detection network, called PFJet-SSD. The network performs simultaneous localization, classification and auxiliary regression tasks to measure jet features. We investigate Ternary Weight Networks with weights quantized to {-1, 0, 1} set, times a layer- and channel-dependent scaling factors for reducing memory and latency constraints. We show that the quantized version of the network closely matches the performance of its full-precision equivalent while both outperform the physics baseline. Finally, we report the inference latency on Nvidia Tesla T4.
The performance demands of future particle-physics experiments investigating the high-energy frontier pose a number of new challenges, forcing us to find new solutions for the detection, identification, and measurement of final-state particles in subnuclear collisions. One such challenge is the precise measurement of muon momenta at very high energy, where the curvature provided by conceivable magnetic fields in realistic detectors proves insufficient to achieve the desired resolution. In this work we show the feasibility of an entirely new avenue for the measurement of the energy of muons based on their radiative losses in a dense, finely segmented calorimeter.
Using a task-specific 3D convolutional neural network, the raw energy deposits in the calorimeter cells may be used as inputs to regress to the energy of the originating muon. We demonstrate that this approach provides superior resolution for high energy muons. Additionally, due to the differing energy dependence, we show that the regression is entirely complementary to traditional tracker-based measurements, which degrades with energy, together allowing one to achieve good resolution across the energy spectrum.
Jet interactions in a hot QCD medium created in heavy-ion collisions are conventionally assessed by measuring the modification of the distributions of jet observables with respect to the proton-proton baseline. However, the steeply falling production spectrum introduces a strong bias toward small energy losses that obfuscates a direct interpretation of the impact of medium effects in the measured jet ensemble. In this talk, we will explore the power of deep learning techniques to tackle this issue on a jet-by-jet basis.
Toward this goal, we employ a convolutional neural network (CNN) to diagnose such modifications from jet images where the training and validation is performed using the hybrid strong/weak coupling model. By analyzing measured jets in heavy-ion collisions, we extract the original jet transverse momentum, i.e., the transverse momentum of an identical jet that did not pass through a medium, in terms of an energy loss ratio. Despite many sources of fluctuations, we achieve good performance and put emphasis on the interpretability of our results. We observe that the angular distribution of soft particles in the jet cone and their relative contribution to the total jet energy contain significant discriminating power, which can be exploited to tailor observables that provide a good estimate of the energy loss ratio.
With a well-predicted energy loss ratio, we study a set of jet observables to estimate their sensitivity to bias effects and reveal their medium modifications when compared to a more equivalent jet population, i.e., a set of jets with similar initial energy. Then, we show how this new technique provides unique access to the initial configuration of jets over the transverse plane of the nuclear collision, both with respect to their production point and initial orientation. Finally, we demonstrate the capability of our new method to locate with unprecedented precision the production point of a dijet pair in the nuclear overlap region, in what constitutes an important step forward towards the long term quest of using jets as tomographic probes of the quark-gluon plasma.
[1] Yi-Lun Du, Daniel Pablos, Konrad Tywoniuk, Deep learning jet modifications in heavy-ion collisions, arXiv:2012.07797 [hep-ph], JHEP. 2021, 206 (2021)
Typically, high-energy physics (HEP) data analysis heavily relies on the production and the storage of large datasets of simulated events. At the LHC, the end-to-end simulation workflow can require up to 50% of the available computing resources of an experiment. Speeding up the simulation process would be crucial to save resources that could be otherwise utilized.
In our study, we investigate the use of Deep Generative Models, and more specifically, of Variational Autoencoders for fast simulation of jets at the LHC. We represent jets as lists of jet constituents (particles) characterized by their momenta. Starting from a simulation of the jet before detector effects, we train a Deep Variational Autoencoder (VAE) to learn a parametric description of the detector response and produce the corresponding list of jet constituents. Doing so, we bypass both the detector simulation and the reconstruction, potentially speeding up significantly the events generation workflow.
For our domain-specific application, we use a permutation-invariant loss function, the Chamfer distance, as the reconstruction loss of the Variational Autoencoder based on the jet constituents. We further modify the reconstruction loss by adding extra penalty terms for the jet mass and the jet transverse momentum $p_T$ to impose physics constraints to the model to learn the jet kinematics. Preliminary results show that jet features like jet pseudorapidity $\eta$, jet polar angle in the transverse plane $\phi$, or jet cartesian momenta $p_{x}$, $p_{y}$, $p_{z}$ are modelled within 10% of accepted accuracy, whereas it is more complicated for the jet mass to be learned.
We introduce a novel strategy for machine-learning-based fast simulators, which is the first that can be trained in an unsupervised manner using observed data samples to learn a predictive model of detector response and other difficult-to-model transformations. Across the physical sciences, a barrier to interpreting observed data is the lack of knowledge of a detector's imperfect resolution, which transforms and obscures the unobserved latent data. Modeling this detector response is an essential step for statistical inference, but closed-form models are often unavailable or intractable, requiring the use of computationally expensive, ad-hoc numerical simulations. Using particle physics detectors as an illustrative example, we describe a novel strategy for a fast, predictive simulator called Optimal Transport based Unfolding and Simulation (OTUS), which uses a probabilistic autoencoder to learn this transformation directly from observed data, rather than from existing simulations. Unusually, the probabilistic autoencoder's latent space is physically meaningful, such that the decoder becomes a fast, predictive simulator for a new latent sample, and has the potential to replace Monte Carlo simulators. We provide proof-of-principle results for $Z$-boson and top-quark decays, but stress that our approach can be widely applied to other physical science fields.
There has been significant development recently in generative models for accelerating LHC simulations. Work on simulating jets has primarily used image-based representations, which tend to be sparse and of limited resolution. We advocate for the more natural ‘particle cloud’ representation of jets, i.e. as a set of particles in momentum space, and discuss four physics- and computer-vision-inspired metrics: (1) the 1-Wasserstein distance between high- and low-level feature distributions; (2) a new Fréchet ParticleNet Distance; (3) the coverage; and (4) the minimum matching distance as means of quantitatively and holistically evaluating generated particle clouds. We then present our new message-passing generative adversarial network (MPGAN), which has excellent performance on gluon, top quark, and lighter quark jets on all metrics, evaluated against real samples via bootstrapping as well as existing point cloud GANs, and shows promise for use in HEP.
We present an implementation of an explainable and physics-aware machine learning model capable of inferring the underlying physics of high-energy particle collisions using the information encoded in the energy-momentum four-vectors of the final state particles. We demonstrate the proof-of-concept of our White Box AI approach using a Generative Adversarial Network (GAN) which learns from a DGLAP-based parton shower Monte Carlo event generator. Our approach leads to a network that is able to learn not only the final distribution of particles, but also the underlying parton branching mechanism, i.e. the Altarelli-Parisi splitting function, the ordering variable of the shower, and the scaling behavior. While the current work is focused on perturbative physics of the parton shower, we foresee a broad range of applications of our framework to areas that are currently difficult to address from first principles in QCD. Examples include nonperturbative and collective effects, factorization breaking and the modification of the parton shower in heavy-ion, and electron-nucleus collisions.
Event generation with neural networks has seen significant progress recently. The big open question is still how such new methods will accelerate LHC simulations to the level required by upcoming LHC runs. We target a known bottleneck of standard simulations and show how their unweighting procedure can be improved by generative networks. This can, potentially, lead to a very significant gain in simulation speed.
A key aspect for the study of particle collisions is the comparison of the experiments data with those resulting from computer simulations, mainly obtained using Monte Carlo-based generators. However the amount of data required in simulations makes this task very time consuming. One approach to avoid this issue is by using machine learning techniques to speed up this process.
In this work, we focus on the simulation of one of the final-state objects of particle collisions that are the hadronic jets. For this study the input dataset consists of the particle constituents of the jets due to its sparsity and the possibility to assay the network's capacity to describe the jets and particles properties. The generative neural network architecture chosen is a variational autoencoder consisting of convolutional layers. For the reconstruction error term we choose a permutation-invariant loss on the particles' properties along with mean-squared error terms measuring the distinction between input and output jets transverse momentum and mass, which imposes physics constraints, allowing the model to learn the kinematics of the jets.
QCD-jets at the LHC are described by simple physics principles. We show how super-resolution generative networks can learn the underlying structures and use them to improve the resolution of jet images. We test this approach on massless QCD-jets and on fat top-jets and find that the network reproduces their main features even without training on pure samples. In addition, we show how a slim network architecture can be constructed once we have control of the full network performance.
Generating large numbers of events efficiently is a major bottleneck for ML projects. As a first step towards a full-fledged event generator for modern GPUs, we investigated different recursive strategies. The GPU implementations are compared to the state-of-the-art CPU codes, showing promise for using these in other pipelines. Finally, we propose baseline implementations for the development of a future full scale event generator on GPUs.
We investigate the possibility of using Deep Learning algorithms for jet identification in the L1 trigger at HL-LHC. We perform a survey of architectures (MLP, CNN, Graph Networks) and benchmark their performance and resource consumption on FPGAs using a QKeras+hls4ml compression-aware training procedure. We use the HLS4ML jet dataset to compare the results obtained in this study to previous literature on Fast Machine Learning applications on FPGAs.
The high collision rates at the Large Hadron Collider (LHC) make it impossible to store every single observed interaction. For this reason, only a small subset that passes so-called triggers — which select potentially interesting events — are saved while the remainder is discarded. This makes it difficult to perform searches in regions that are usually ignored by trigger setups, for example at low energies. However a sufficiently efficient data compression method could help these searches by storing information about more events than can be saved offline.
We investigate the use of a generative machine learning model (specifically a normalizing flow) for the purpose of this compression. The model is trained online on the collision data, essentially encoding the underlying data structure in the network weights. The data generated by the trained model can then be analysed and for example probed for anomalies offline.
We demonstrate this method for a simple bump hunt, showing that we can detect resonances that would have been missed under regular trigger setups.
The Reproducible Open Benchmarks for Data Analysis Platform (ROB) is a platform that allows for the evaluation of different data analysis algorithms in a controlled competition-style format [1]. One example for such a comparison and evaluation of different algorithms is the “The Machine Learning Landscape of Top Taggers” paper, which compiled and compared multiple different top tagger neural networks [2]. Motivated by the significant amount of time required to organize and evaluate such benchmarks, ROB provides a platform that automates the collection, execution, and comparison of participant submissions in a benchmark. Although convenient, the ROB currently requires participants to package their submissions into docker containers, which can pose an additional burden due to the steep learning curve.
To increase ease of use, we implement support for the commonly used Jupyter Notebooks [3] in ROB. Jupyter Notebooks are a popular tool that many physicists are already familiar with. Using Jupyter notebooks, physicists are able to combine live code, comments, and documentation inside one document. By utilizing the PaperMill package [4], we allow ROB users to submit their implementations directly as Jupyter Notebooks in order to evaluate different data analysis algorithms without the need to package the code into Docker containers. To demonstrate functionality and spur usage of the ROB, we provide demos using bottom and top tagging neural networks that display the application of the ROB within particle physics as a way of providing a competition style platform for algorithm evaluation [5].
References:
[1] “Reproducible and Reusable Data Analysis Workflow Server”, https://github.com/scailfin/flowserv-core
[2] Kasieczka, Gregor, Plehn, Tilman, Butter, Anja, Cranmer, Kyle, Debnath, Dipsikha, Dillon, Barry M, . . . Varma, Sreedevi. (2019). The Machine Learning landscape of top taggers. SciPost Physics, 7(1), 014.
[3] “Jupyter Notebooks”, https://jupyter.org/
[4] “Papermill”, https://papermill.readthedocs.io/en/latest/
[5] “Particle Physics”, https://github.com/anrunw/ROB
We introduce a collection of datasets from fundamental physics research including particle physics, astroparticle physics, hadron, and nuclear physics for supervised machine learning studies. These datasets, containing hadronic top quarks, cosmic air showers, phase transitions in the hadronic matter, and generator-level histories, are combined and made public to simplify future work on cross-disciplinary machine learning and transfer learning in fundamental physics. Based on these samples, we present two simple and yet flexible models: a fully connected neural network and a graph-based neural network architecture that can easily be applied to a wide range of supervised learning tasks in these domains. Furthermore, we show that our approaches reach performance close to state-of-the-art dedicated methods on all datasets.
The data challenge is "anomaly detection @ 40 MHz" for which the biggest concern
is to fit an algorithm in the tight constraints, which are presented in the talk.
Considering as a benchmark an inclusive data stream, which has been pre-filtered
by requiring the presence of one lepton, we discuss different possible strategies
to detect new physics events as anomalies. The main goal of the challenge is to
then seek for new ideas on how to do anomaly detection.
This talk is about how we can use ML to identify symmetries (conserved quantities) of physical systems. I report on three different strategies to find symmetries:
1) By examining the embedding a (deep) neural network adapts on a simple supervised task (2003.13679).
2) By imposing a modification to Hamiltonian Neural Networks such that a coordinate transformation ensures the emergence of conserved quantities (symmetry control neural networks, 2104.14444).
3) By searching for a Lax pair/connection to identify whether a system is integrable (2103.07475), i.e. it has as many conserved quantities as degrees of freedom.
I comment on how strategies 1) and 3) enable us to search for new mathematical structures and how 2) can be used to accelerate simulations.
Unsupervised anomaly detection could be crucial in future analyses searching for rare phenomena in large datasets, as for example collected at the LHC. To this end, we introduce a physics inspired variational autoencoder (VAE) architecture which performs competitively and robustly on the LHC Olympics Machine Learning Challenge datasets. We demonstrate how embedding some physical observables directly into the VAE latent space, while at the same time keeping the classifier manifestly agnostic to them, can help to identify and characterise features in measured spectra as caused by the presence of anomalies in a dataset.
Symmetries are a fundamental property of functions applied to datasets. A key function for any dataset is the probability density, and the corresponding symmetries are often referred to as the symmetries of the dataset itself. We provide a rigorous statistical notion of symmetry for a dataset, which involves reference datasets that we call inertial in analogy to inertial frames in classical mechanics. Then, we construct a novel approach to automatically discover symmetries from a dataset using a deep learning method based on an adversarial neural network. We show how this model performs on simple examples and provide a corresponding analytic description of the loss landscape. Symmetry discovery may lead to new insights and can reduce the effective dimensionality of a dataset to increase its effective statistics.
Fundamental laws of physics introduce specific topological features in the phase-space of n-body processes in collider events. We introduce a new analysis approach relying on analyzing such global topological properties of the manifold over the distribution of events. One specific property of potential interest is the dimensionality of the phase space. It can, for example, be used for clustering events and discovering anomalies in an unsupervised way.
Focusing on the Drell-Yan process with and without Z-resonance, we show that the dimensionality can be accurately estimated using the minimal neighborhood information. As a resonance reduces the dimensionality by one, we can use this to separate the two processes. Our approach can be extended to more complicated processes and generally has a potentially wide range of applications in particle physics.
We build a simple probabilistic model for collider events represented by a pattern of points in a space of high-level observables. The model is based on three assumptions for the point data: the measurements in individual events are discrete, exchangeable, and generated from a mixture of latent distributions, or 'themes'. The result is a mixed-membership model known as Latent Dirichlet Allocation (LDA), extensively used in natural language processing, biology and many unsupervised machine learning applications. By training on point patterns in the Lund jet plane, we demonstrate that a two-theme LDA model can be used for fully unsupervised event classification. As an example, we show that the LDA classifier can detect a BSM heavy resonance hidden in dijet data.
Deep neural networks (DNNs) are essential tools in particle physics targeting various use cases ranging from reconstruction of particles up to event classification and anomaly detection. Whereas DNNs for event classification are primarily trained on quantities deduced from the kinematic properties of the particles in the final state (high-level observables), we present an alternative approach of using exclusively basic kinematic object information (low-level observables) in combination with attention and dynamic graph convolution neural networks. Their performance is evaluated in terms of the potential of discriminating events in which the Higgs boson is produced in association with top quarks and decays into a pair of bottom quarks from the overwhelming background, which is top quark pair production in association with b jets. Studying the Higgs boson in such a complex final state with high object multiplicity is simultaneously challenging and crucially important for precision tests of the standard model.
Furthermore, DNNs are often seen as black boxes due to the difficulty of understanding what information they learn. We present detailed studies of the latent spaces of both the attention and graph networks in order to provide insights into what the neural networks learn about the physics and event topology as well as to demonstrate the potential of such networks for alternative analysis approaches, e.g., mass peak searches.
Autoencoders as tools behind anomaly searches at the LHC have the structural problem that they only work in one direction, extracting jets with higher complexity but not the other way around. To address this, we derive classifiers from the latent space of (variational) autoencoders, specifically in Gaussian mixture and Dirichlet latent spaces. In particular, the Dirichlet setup solves the problem and improves both the performance and the interpretability of the networks.
Given the increasing data collection capabilities and limited computing resources of future collider experiments, interest in using generative neural networks for the fast simulation of collider events is growing. In our previous study, the Bounded Information Bottleneck Autoencoder (BIB-AE) architecture for generating photon showers in a high-granularity calorimeter showed a high accuracy modeling of various global differential shower distributions. In this work, we investigate how the BIB-AE encodes this physics information in its latent space. Our understanding of this encoding allows us to propose methods to optimize the generation performance further, for example, by altering latent space sampling or by suggesting specific changes to hyperparameters. In particular, we improve the modeling of the shower shape along the particle incident axis.
We introduce persistent Betti numbers to characterize topological structure of jets. These topological invariants measure multiplicity and connectivity of jet branches at a given scale threshold, while their persistence records evolution of each topological feature as this threshold varies. With this knowledge, in particular, we are able to reconstruct branch phylogenetic tree of each jet. These points are demonstrated in the benchmark scenario of light-quark versus gluon jets. This study provides a topological tool to develop jet taggers, and opens a new angle to look into jet physics.
Neural Stochastic Differential Equations model a dynamical environment with neural nets assigned to their drift and diffusion terms. The high expressive power of their nonlinearity comes at the expense of instability in the identification of the large set of free parameters. This worok presents a recipe to improve the prediction accuracy of such models in three steps: i) accounting for epistemic uncertainty by assuming probabilistic weights, ii) incorporation of partial knowledge on the state dynamics, and iii) training the resultant hybrid model by an objective derived from a PAC-Bayesian generalization bound. We observe in our experiments that this recipe effectively translates partial and noisy prior knowledge into an improved model fit.
Recently, Generative Adversarial Networks (GANs) trained on samples of traditionally simulated collider events have been proposed as a way of generating larger simulated datasets at a reduced computational cost. In this talk we will present an argument cautioning against the usage of this method to meet the simulation requirements of an experiment, namely that data generated by a GAN cannot statistically be better than the data it was trained on.
We will also state and prove a theorem that limits the ability of GANs to replace traditional simulators in collider physics.
We show how Bayesian neural networks can be used to estimate uncertainties associated with regression, classification, and now also generative networks. For generative INNs, the combination of the learned density and uncertainty maps also provide insights into how these networks learn. These results show that criticizing the use of neural networks in LHC physics as black boxes is a sociological rather than scientific statement.
Machine learning techniques are becoming an integral component of data analysis in High Energy Physics (HEP). These tools provide a significant improvement in sensitivity over traditional analyses by exploiting subtle patterns in high-dimensional feature spaces. These subtle patterns may not be well-modeled by the simulations used for training machine learning methods, resulting in an enhanced sensitivity to systematic uncertainties.
Contrary to the traditional wisdom of constructing an analysis strategy that is invariant to systematic uncertainties, we study the use of a classifier that is fully aware of uncertainties and their corresponding nuisance parameters. We show that this dependence can actually enhance the sensitivity to parameters of interest. Studies are performed using a synthetic Gaussian dataset as well as a more realistic HEP dataset based on Higgs boson decays to tau leptons. For both cases, we show that the uncertainty aware approach can achieve a better sensitivity than alternative machine learning strategies.
https://arxiv.org/abs/2105.08742
Monte Carlo simulations are a vital part of modern particle physics. However classical approaches to these simulations require a vast amount of computational resources. Generative Machine Learning models offer a chance to reduce this strain on computing capabilities by allowing us to generate simulated data at a significantly greater speed. The applicability of such generative models has been demonstrated for many problems in particle physics, ranging from event generation to fast calorimeter simulation to many more.
However, one question that needs to be addressed before we can fully utilise generative models is whether a generative model can achieve a more precise description of a given underlying distribution than the data the model was originally trained on. We explore this using a simple toy example and show that a generative model can indeed be used to amplify a data set.
The classification of jets as quark- versus gluon-initiated is an important yet challenging task in the analysis of data from high-energy particle collisions and in the search for physics beyond the Standard Model. The recent integration of deep neural networks operating on low-level detector information has resulted in significant improvements in the classification power of quark/gluon jet tagging models. However, the improved power of such models trained on simulated samples has come at the cost of reduced interpretability, raising concerns about their reliability. We elucidate the physics behind quark/gluon jet classification decisions by comparing the performance of net-works with and without constraints of infrared and collinear safety, and identify the nature of the unsafe information by revealing the energy and angular dependence of the learned models. This in turn allows us to approximate the performance of the low-level networks (by 99% or higher) using equivalent sets of interpretable high-level observables, which can be used to probe the fidelity of the simulated samples and define systematic uncertainties.
Nearly five years ago we introduced tree-based recursive NN models for jet physics, which intuitively reflected the sequence of 1-to-2 splittings found in a parton shower. Subsequently, tree-based models like JUNIPR were developed as (probabilistic) generative models that could be used for classification and reweighing. One result that somewhat undermined the narrative of the connection between the inductive bias of the architectures and the underlying physics for this class of tree-based models was that they continued to perform well even if the jet algorithm that was used did not reflect the underlying physics of the parton shower (eg. anti-kT, a simple pT ordering, or a 2d-printer). Later, even simpler models based on DeepSets were introduced that focused on permutation invariance and performed without reference to the underlying showering picture at all. In this talk I’d like to revisit these two forms of inductive bias for models for jets from a new perspective. In particular, I will discuss the expressive power of these network architectures, their connection to the jet clustering algorithm, and make some predictions for experiments that will be conducted over the coming months.
A framework is presented to extract and understand decision-making information from a deep neural network classifier of jet substructure tagging techniques. The general method studied is to provide expert variables that augment inputs (“eXpert AUGmented” variables, or XAUG variables), then apply layerwise relevance propagation (LRP) to networks that have been provided XAUG variables and those that have not. The XAUG variables are concatenated to the classifier’s intermediate input to the final layer. The results show that XAUG variables can be used to interpret classifier behavior, increase discrimination ability when combined with low-level features, and in some cases capture the behavior of the classifier completely. The LRP technique can be used to find relevant information the network is using, and when combined with the XAUG variables, can be used to rank features, allowing one to find a reduced set of features that capture a majority of network performance. These identified XAUG variables can also be added to low-level networks as a guide to improve performance.
Four-tops (and its backgrounds) is very hard to model at the LHC, it represents a unique window for detecting top-philic NP, and its current measurements have some tension with theory and predictions. We find that simple, clean and powerful Bayesian Inference can be applied on the data to infer signal and background true distributions. We propose that these results could be used in a novel way to test for SM agreement and/or NP effects in four-top final state at the LHC.
Machine learning (ML) is pushing through boundaries in computational physics.
Jet physics, with it's large and detailed dataset, is particularly well suited.
In this talk I will discuss the application of an unusual ML technique, Spectral Clustering, to jet formation.
Spectral clustering differers from much of ML as it has no "black-box" elements.
Instead, it is based on a simple, elegant algebraic manipulation.
This allows us to inspect the way the algorithm is interpreting the data, and apply physical intuition.
Infrared-collinear (IRC) safety is of critical importance to jet physics.
IRC safety requires that jets formed are insensitive to collinear splitting and soft emissions.
Spectral clustering is shown to be possible to apply in an IRC safe way, and the conditions for this are noted.
Finally, the capacity of spectral clustering to handle different datasets is shown.
Its excellent performance, both in terms of multiplicity and mass peaks is demonstrated.
In particular we show great performance on two datasets from the extended Higgs sector, alongside the semileptonic top.
The reasons for its flexibility are discussed, and potential developments offered.
The choice of optimal event variables is crucial for achieving the maximal sensitivity of experimental analyses, and suitable kinematic variables for many well-motivated event topologies have been developed in collider physics. Here we propose a deep-learning-based algorithm to design good event variables that are sensitive to a wide range of the unknown model parameter values. We demonstrate that the neural networks trained with our algorithm on some simple event topologies are able to reproduce standard variables like invariant mass, transverse mass, and stransverse mass. These simple exercises can address two issues: 1) what have machines learned (explainability)? and 2) are human-engineered features best (optimality)? The method is automatable, completely general, and can be used to derive sensitive, previously unknown, event variables for other, more complex event topologies.
To obtain information on the still unknown sources of ultra-high-energy cosmic rays (UHECRs), a combined fit of the observed energy spectrum and depths of the shower maximum can be used, which constrains characteristic parameters of the sources. During propagation from the sources to Earth, UHECRs can experience numerous stochastic processes such that no explicit inverse function, which would describe the source parameters as a function of the measured quantities, can be formulated.
Previously, the high-dimensional space of possible combinations of source parameters has been investigated with Bayesian sampler methods like the Markov Chain Monte Carlo (MCMC), which in general is very computationally expensive. Here, we introduce the application of a new method using deep-learning techniques, the so-called conditional Invertible Neural Network (cINN). The network implicitly learns a mapping between the source parameters and the observables from a large set of training data. The backward pass through the trained invertible network then provides the full posterior probabilities including correlations. It needs far less computational resources than the MCMC and can be evaluated in seconds, and therefore the cINN enables extensive rapid tests of the method’s accuracy. In this work, we compare the results of the two methods, MCMC and cINN, applied on a simulated scenario inspired by current UHECR measurements of the energy spectrum and the depths of shower maximum distributions. For the cINN, we also evaluate the performance on many test data sets and show that the mean estimates of the source parameters as well as a the widths of the posterior distributions can be described accurately.
I describe a new machine learning algorithm, Via Machinae, to identify cold stellar streams in data from the Gaia telescope. Via Machinae is based on ANODE, a general method that uses conditional density estimation and sideband interpolation to detect local overdensities in the data in a model agnostic way. By applying ANODE to the positions, proper motions, and photometry of stars observed by Gaia, Via Machinae obtains a collection of those stars deemed most likely to belong to a stellar stream. In this talk, I will provide an overview of the Via Machinae algorithm, using the known stream GD-1 as a worked example, and show preliminary results of our analysis across the full sky.
I will give a very brief (and incomplete) review on quantum machine learning techniques and focus then on novel quantum computing approaches for the task of finding a solution to an optimisation problem. I will then give explicit examples how quantum machine learning techniques can be used for classification tasks and to calculate solutions to nonperturbative problems in quantum field theory.