- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !
Machine learning has become a hot topic in particle physics over the past several years. In particular, there has been a lot of progress in the areas of particle and event identification, reconstruction, generative models, anomaly detection and more. In this conference, we will discuss current progress in these areas, focusing on new breakthrough ideas and existing challenges.
The ML4Jets workshop will be open to the full community and will include LHC experiments as well as theorists and phenomenologists interested in this topic. We explicitly welcome contributions and participation from method scientists as well as adjacent scientific fields such as astronomy, astrophysics, astroparticle physics, hadron- and nuclear physics and other domains facing similar challenges.
This year's conference is organised jointly by DESY and Universität Hamburg and hosted at the DESY campus. It follows conferences in 2017, 2018, 2020, 2021, and 2022.
In-person registration and abstract submission are closed. Remote registration is possible until the end of the workshop.
The workshop will be organised in a hybrid format (with a Zoom connection option). We expect speakers to attend in-person.
Registration for both in-person and Zoom-participation will be free of charge and (at the minimum) include coffee-breaks for in-person participants. We are looking into an opt-in dinner and announce details and potential extra costs closer to the event.
Join the ML4Jets Slack Channel for discussions.
Local Organizing Committee:
Freya Blekman (DESY & Universität Hamburg)
Andrea Bremer (Universität Hamburg)
Frank Gaede (DESY)
Gregor Kasieczka (Universität Hamburg, chair)
Andreas Hinzmann (DESY)
Matthias Schröder (Universität Hamburg)
International Advisory Committee:
Florencia Canelli (University of Zurich)
Kyle Cranmer (NYU)
Vava Gligorov (LPNHE)
Gian Michele Innocenti (CERN)
Ben Nachman (LBNL)
Mihoko Nojiri (KEK)
Maurizio Pierini (CERN)
Tilman Plehn (Heidelberg)
David Shih (Rutgers)
Jesse Thaler (MIT)
Sofia Vallescorsa (CERN)
The full simulation of particle colliders incurs a significant computational cost. Among the most resource-intensive steps are detector simulations. It is expected that future developments, such as higher collider luminosities and highly granular calorimeters, will increase the computational resource requirement for simulation beyond availability. One possible solution is generative neural networks that can accelerate simulations. Normalizing flows are a promising approach. It has been previously demonstrated, that such flows can generate showers in calorimeters with high accuracy. However, the main drawback of normalizing flows with fully connected sub-networks is that they scale poorly with input dimensions. We overcome this issue by using a U-Net based flow architecture and show how it can be applied to accurately simulate showers in highly granular calorimeters.
Normalizing-flow architectures have shown outstanding performance in various generative tasks at the LHC. However, they don't scale well to higher dimensional datasets. We investigate several directions to improve normalizing flows for calorimeter shower simulations: 1) using a coupling-layer based flow to improve training and generation times without dimensionality reduction, and 2), using a VAE to compress the very high-dimensional datasets 2 and 3 of the CaloChallenge.
The use of machine learning for collider data generation has become a significant area of study within particle physics. This interest arises from the increasing computational difficulties associated with traditional Monte Carlo simulation methods, especially in the context of future high-luminosity colliders. Representing collider data as particle clouds introduces several advantageous aspects, e.g. the intricate correlations present in particle clouds can be used as sensitive tests for the accuracy of a generative model in approximating and sampling the underlying probability density. The complexities are further amplified by variable particle cloud sizes, which necessitate the use of more sophisticated models.
In this study, we present a novel model that employs an attention-based aggregation mechanism to address these challenges. The model uses adversarial training, ensuring the generator and critic exhibit permutation equivariance and invariance respectively with respect to their input. A feature matching loss for the generator is also introduced to stabilize the training process. The proposed model competes favourably with the state-of-the-art on the JetNet150 dataset, whilst demonstrating a significantly reduced parameter count compared to other top-tier models. Additionally, the model is applied to CaloChallenge dataset 2 and 3, where it yields promising results.
Simulation of calorimeter response is a crucial part of detector study for modern high energy. The computational cost of conventional MC-based simulation becoming a major bottleneck with the increasingly large and high granularity design. We propose a 2-step generative model for fast calorimeter simulation based on Vector-Quantized Variational Autoencoder (VQ-VAE). This model achieves a fast generation < 1ms/shower for dataset with about 500 dimensions, and the chi2 difference of energy compared to GEANT4 is less than 0.01. We also demonstrate the flexibility for this latent generative design which can adapt to a variety of encoder/decoder architectures and scale up to larger dataset with more than 40000 dimensions with generation time scaling better than O(N).
The simulation requirements of experiments in high energy physics place major demands on the available computing resources. These simulation pressures are projected to increase further at the upcoming high luminosity phase of the LHC and for future colliders. An additional challenge arises from the significantly higher granularity present in future detectors, which increases the physical accuracy required of a surrogate simulator. Machine learning methods based on deep generative models have the potential to provide simulation tools that could help to alleviate these computational pressures.
While significant progress has been made on the development of generative models for the simulation of showers in highly granular calorimeters, key challenges have yet to be addressed. In particular, these simulators must be able to provide an appropriate detector response for particles incident at various positions and under different angles. This contribution will present progress on these requirements by generalising the performant Bounded Information Bottleneck Autoencoder (BIB-AE) architecture to multi-parameter conditioning scenarios. Particular focus will be given to the high degree of physics fidelity achieved after interfacing with state-of-the-art reconstruction algorithms. Additionally, progress on the integration of these surrogate simulators into full simulation chains will be discussed. These advances represent key steps towards benchmarking the performance of such simulators on full physics events.
When measuring rare processes at Belle II, a huge luminosity is required, which means a large number of simulations are necessary to determine signal efficiencies and background contributions. However, this process demands high computation costs while most of the simulated data, in particular in case of background, are discarded by the event selection. Thus, filters using graph neural networks are introduced at an early stage to save the resources for the detector simulation and reconstruction of events discarded at analysis level. In our work, we improved the performance of the filters using graph attention and investigated statistical methods including sampling and reweighting to deal with the biases introduced by the filtering.
Experimental uncertainties related to the calibration of hadronic objects (particularly the jet energy scale and resolution) can limit the precision of physics analyses at the LHC, and so improvements in performance have the potential to broadly increase the impact of results. Such settings are among most promising for cutting-edge machine learning and artificial intelligence algorithms at the LHC. Recent refinements to reconstruction and calibration procedures for ATLAS hadronic object reconstruction & calibration using ML and in situ techniques result in reduced uncertainties, improved pileup stability and other performance gains. In this contribution, new developments in this area will be presented.
This talk will overview the usage of boosted multi-prong jet tagging in CMS and how such taggers are calibrated. It will highlight a new method for calibrating the tagging of multi-prong jets using the Lund Jet Plane to correct the substructure of simulated jets. The method is shown to significantly improve the data-simulation agreement of substructure observables.
The identification of heavy-flavour jets (tagging) remains a critical task at hadron colliders. A key signature of such jets is the displaced decay vertices left by boosted b- and c-hadrons. While existing tagging algorithms leveraged manually designed algorithms to identify and fit vertices, they were succeeded by edge-classification based Graph Neural Networks (GNNs) that, despite identifying vertices, fell short of reconstructing their properties. We propose the use of a transformer architecture for vertex reconstruction inside jets. Using reconstructed tracks, our approach is able to simultaneously identify the decay of heavy-flavour hadrons, assign tracks to the respective decay vertices, and determine each vertex’s properties, overcoming a key limitation of previous ML-based approaches to vertex reconstruction.
Physics measurements in the highly Lorentz-boosted regime, including the search for the Higgs boson or beyond standard model particles, are a critical part of the LHC physics program. In the CMS Collaboration, various boosted-jet tagging algorithms, designed to identify hadronic jets originating from a massive particle decaying to bb̅ or cc̅, have been developed and deployed in a variety of analyses. This talk highlights their performance on simulated events, and summarises the novel calibration methods of these algorithms with 2016-2018 data collected in proton-proton collisions at √s = 13 TeV. Three distinct control regions are studied, selected via machine learning techniques or the presence of reconstructed muons from g → bb̅ (cc̅) decays, as well as regions selected from Z boson decays. The calibration results, derived through a combination of measurements in these three regions, are presented.
Inspired by the recent successes of language modelling and computer vision machine learning techniques, we study the feasibility of repurposing these developments for particle track reconstruction in the context of high energy physics. In particular, drawing from developments in the field of language modelling we showcase the performance of multiple implementations of the transformer model, including an autoregressive transformer with the original encoder-decoder architecture, and encoder-only architectures for the purpose of track parameter classification and clustering. Furthermore, in the context of computer vision we study a U-net style model with submanifold convolutions, treating the event as an image and highlighting those pixels where a hit was detected.
We benchmark these models on simplified training data utilising a recently developed simulation framework, REDuced VIrtual Detector (REDVID). These data include noisy linear and helical track definitions, similar to those observed in particle detectors from major LHC collaborations such as ATLAS and CMS. We find that the proposed models can be used to effectively reconstruct particle tracks on this simplified dataset, and we compare their performances both in terms of reconstruction efficiency and runtime. As such, this work lays the necessary groundwork for developments in the near future towards such novel machine learning strategies for particle tracking on more realistic data.
The FCC will deliver a large dataset thanks to its unprecedented luminosity. Improving the quality of the event reconstruction at different levels will allow to increase the accuracy of the physics measurements we can achieve. For example, at the particle-level reconstruction, where information from different sub-detectors e.g tracker and calorimeter is available, ML shows promise to improve the reconstruction by learning to disentangle complex or overlapping shower geometries. At a higher level, for reconstructing color-neutral resonances such W, Z or Higgs particles, similar tools can improve the clustering performance using an end-to-end approach, and reduce errors coming from inaccurately clustering soft particles from various resonances or incorrect jet pairing. Our work is focused on the study of GNN architectures that are scalable and improve the performance on these complex tasks over classical approaches.
Time-of-flight (TOF) reconstruction is under investigation as a method to enhance the particle identification capabilities of detectors proposed for future Higgs factories. By utilising time measurements based on energy deposits of showers in the calorimeter system, the TOF of the particle can be inferred. The focus of our studies is the International Large Detector (ILD), a proposed detector for operation at a future Higgs factory.
Since the current TOF estimator used by ILD can only extract information from a limited number of calorimeter hits, we propose to use machine learning (ML) algorithms that are able to operate on a significantly increased fraction of hits in the shower and thereby access additional information.
Results will be presented for a convolutional neural network and a network using equivariant point cloud (EPiC) layers, operating on calorimeter showers represented as point clouds combined with track feature information. A comparison to the existing TOF estimator will highlight the significant improvements achieved.
Tree structure is a natural way to represent particle decays in high energy physics. The possibility of reconstructing the entire decay tree that ends in stable particles entering the detector is an interesting and potentially beneficial task. [The interesting and extremely helpful task is to reconstruct the entire decay process, starting from the leaf nodes, which are the reconstructed particles.] We propose a graph-based neural network for tree reconstruction using truth-level particles as a starting point. The proposed model’s performance was evaluated on the toy Phasespace dataset and realistic Pythia8 simulations of light quark decay chains.
Most searches at the LHC employ an analysis pipeline consisting of various discrete components, each individually optimized and later combined to provide relevant features used to discriminate SM background from potential signal. These are typically high-level features constructed from particle four-momenta. However, the combination of individually optimized tasks doesn't guarantee an optimal performance on the final analysis objective. In this study, we show how an analysis would benefit from adopting an end-to-end ML optimization approach. Specifically, we investigate the impact of jointly optimizing particle identification and signal vs background discrimination exploiting the ParT transformers architecture [arXiv:2202.03772], showing its effectiveness in the case of multi jets final states with CMS open data [DOI:10.7483/OPENDATA.CMS.JGJX.MS7Q].
In this work we introduce ν²-Flows, an extension of the ν-Flows method to final states containing multiple neutrinos. The architecture can natively scale for all combinations of object types and multiplicities in the final state for any desired neutrino multiplicities. In ttbar dilepton events, the momenta of both neutrinos and correlations between them are reconstructed more accurately than when using the most popular standard analytical techniques, and solutions are found for all events. Inference time is significantly faster than competing methods, and can be reduced further by evaluating in parallel on graphics processing units. We apply ν²-Flows to ttbar dilepton events and show that the per-bin uncertainties in unfolded distributions is much closer to the limit of performance set by perfect neutrino reconstruction than standard techniques. For the chosen double differential observables ν²-Flows results in improved statistical precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino Weighting method and up to a factor of four in comparison to the Ellipse approach.
Axion-like particles (ALPs) arise in beyond the Standard Model theories with global symmetry breaking. Several experiments have been constructed and proposed to look for them at different energy scales. We focus here on beam-dump experiments looking for GeV scale ALPs with macroscopic decay lengths. In this work we show that using ML we can reconstruct the ALP properties (mass and lifetime) even from inaccurate detector observations. We use a simulation-based inference approach based on conditional invertible neural networks to reconstruct the posterior probability of the ALP parameters. This neural network outperforms parameter reconstruction from conventional high-level observables while at the same time providing reliable uncertainty estimates. Moreover, the neural network can be quickly trained for different detector properties, making it an ideal framework for optimizing experimental design.
At experiments at the LHC, a growing reliance on fast Monte Carlo applications will accompany the high luminosity and detector upgrades of the Phase 2 era. Traditional FastSim applications which have already been developed over the last decade or more may help to cope with these challenges, as they can achieve orders of magnitude greater speed than standard full simulation applications. However, this advantage comes at the price of decreased accuracy in some of the final analysis observables. In this contribution, a machine learning-based technique to refine those observables is presented. We employ a regression neural network trained with an optimized combination of multiple loss functions to provide post-hoc corrections to samples produced by a standard FastSim application based on the CMS detector. The results show considerably improved agreement with a detailed MC application and an improvement in correlations among output observables and external parameters. This technique is a promising replacement for existing correction factors, providing higher accuracy and thus contributing to the wider usage of fast simulation applications.
Machine learning-based simulations, especially calorimeter simulations, are promising tools for approximating the precision of classical high energy physics simulations with a fraction of the generation time. Nearly all methods proposed so far learn neural networks that map a random variable with a known probability density, like a Gaussian, to realistic-looking events. In many cases, physics events are not close to Gaussian and so these neural networks have to learn a highly complex function. We study an alternative approach: Schrödinger bridge Quality Improvement via Refinement of Existing Lightweight Simulations (SQuIRELS). SQuIRELS leverages the power of diffusion-based neural networks and Schrödinger bridges to map between samples where the probability density is not known explicitly. We apply SQuIRELS to the task of refining a classical fast simulation to approximate a full classical simulation. On simulated calorimeter events, we find that SQuIRELS is able to reproduce highly non-trivial features of the full simulation with a fraction of the generation time.
Calorimeter shower simulation is a major bottleneck in the Large Hadron Collider computational pipeline. There have been recent efforts to employ deep-generative surrogate models to overcome this challenge. However, many of best performing models have training and generation times that do not scale well to high-dimensional calorimeter showers. We introduce SuperCalo, a flow-based super-resolution model, and demonstrate that high-dimensional fine-grained calorimeter showers can be quickly upsampled from coarse-grained showers. This novel approach presents a way to reduce computational cost, memory requirements and generation time associated with fast calorimeter simulation models.
Accurately reconstructing particles from detector data is a critical challenge in experimental particle physics. The detector's spatial resolution, specifically the calorimeter's granularity, plays a crucial role in determining the quality of the particle reconstruction. It also sets the upper limit for the algorithm's theoretical capabilities. Super-resolution techniques can be explored as a promising solution to address the limitations imposed by the detector's spatial resolution. Super-resolution refers to enhancing the resolution of low-resolution images to obtain higher-resolution versions. In the specific case of calorimeter data, which is characterized by sparsity and non-homogeneity, representing it using graphs provides the most faithful representation. Building upon this idea, we propose a diffusion model for graph super-resolution that uses a transformer-based de-noising network to enhance the resolution of calorimeter data. Notably, this study represents the first instance of applying graph super-resolution with diffusion. The low-resolution image, corresponding to recorded detector data, is also subject to noise from various sources. As an added benefit, the proposed model aims to remove these noise artifacts, further contributing to improved particle reconstruction.
Photons are important objects at collider experiments. For example, the
Higgs boson is studied with high precision in the diphoton decay channel. For this purpose, it is crucial to achieve the best possible spatial resolution for photons and to discriminate against other particles which mimic the photon signature, mostly Lorentz-boosted $\pi^0\to\gamma\gamma$ decays.
In this talk, a study of super-resolution algorithms for photons is presented.
We utilize Wasserstein generative adversarial networks based on the ESRGAN architecture, augmented by a physics-driven perceptual loss term and other modifications.
The energy depositions of simulated showers of photons and neutral-pion decays in a PbWO4 calorimeter are treated as 2D images, which are upsampled with our super-resolution networks by a factor of four in each dimension. The generated images are able to reproduce features of the simulated high-resolution showers that are not obvious from the nominal resolution. It is shown that using the artificially-enhanced images
for the reconstruction of shower-shape variables and the positions of the
shower centers results in significant improvements. In addition, it is illustrated that the performance of deep-learning based identification algorithms can be enhanced by using super-resolution as image-preprocessing, if only low statistics are available in the classifiers’ training sample.
Supervised learning has been used successfully for jet classification and to predict a range of jet properties, such as mass and energy. Each model learns to encode jet features, resulting in a representation that is tailored to its specific task. But could the common elements underlying such tasks be combined in a single model trained to extract features generically? To address this question, we explore self-supervised learning (SSL), inspired by its applications in the domains of computer vision and natural language processing. Besides offering a simpler and more resource-effective route when learning multiple tasks, SSL can be trained on unlabeled data. We demonstrate that a jet representation obtained through self-supervised learning can be readily fine-tuned for downstream tasks of jet kinematics prediction and tagging, and provides a solid basis for unsupervised anomaly detection. Compared to existing studies in this direction, we use a realistic full-coverage calorimeter simulation, leading to results that more faithfully reflect the prospects at real collider experiments.
We present CoCo (Contrastive Combinatorics) a new approach using contrastive learning to solve object assignment in HEP. By utilizing contrastive objectives, CoCo aims to pull jets originating from the same parent closer together in an embedding space while pushing unrelated jets apart.
This approach can be extended natively to have multiple objectives for each subsequent particle in a decay chain, and results in a flexible and interpretable embedding space.
After learning an embedding, we can perform a clustering in this space to recover the final assignment of jets to their parent particles.
We benchmark our performance against the chi2 method, as well as Topographs on the tt~ system.
Abstract: Unsupervised machine learning enables us to utilize all available information within a jet to identify anomalies. Nevertheless, the network's need to acquire knowledge about the inherent symmetries within the raw data structure can hinder this process. Self-supervised contrastive learning representation offers a novel approach that preserves physical symmetries in the data while retaining the crucial discriminating features within the data based on fewer assumptions. We introduce darkCLR, a transformer-encoder network developed for self-upervised identifying of semi-visible jet. Finally, training a density-based NAE for representation evaluation resulted in improved performance metrics, including AUC and signal efficiency.
Last year we proposed a novel hypergraph-based algorithm (HGPflow) for one-shot prediction of particle cardinality, class, and kinematics in a dataset of single jets. This approach has the advantage of introducing energy conservation as an inductive bias, promoting both interpretability and performance gains at the particle and jet levels. We now deploy an upgraded version of HGPflow to the “big picture” of full proton-proton collisions in a realistic detector simulation and study how its success at the local scale translates into event-level quantities.
We study scalable machine learning models for full event reconstruction in high-energy electron-positron collisions based on a highly granular detector simulation. Particle-flow (PF) reconstruction can be formulated as a supervised learning task using tracks and calorimeter clusters or hits. We compare a graph neural network and kernel-based transformer and demonstrate that both avoid quadratic memory allocation and computational cost while achieving realistic PF reconstruction. We show that hyperparameter tuning on a supercomputer significantly improves the physics performance of the models. We also demonstrate that the resulting model is highly portable across hardware processors, supporting Nvidia, AMD, and Intel Habana cards. Finally, we demonstrate that the model can be trained on highly granular inputs consisting of tracks and calorimeter hits, resulting in a competitive physics performance with the baseline. Datasets and software to reproduce the studies are published following the findable, accessible, interoperable, and reusable (FAIR) principles.
The Bert pretraining paradigm has proven to be highly effective in many domains including natural language processing, image processing and biology. To apply the Bert paradigm the data needs to be described as a set of tokens, and each token needs to be labelled. To date the Bert paradigm has not been explored in the context of HEP. The samples that form the data used in HEP can be described as a set of particles (tokens) where each particle is represented as a continuous vector. We explore different approaches for discretising/labelling particles such that the Bert pretraining can be performed and demonstrate the utility of the resulting pretrained models on common downstream HEP tasks.
The reconstruction of physical observables in hadron collider events from recorded experimental quantities poses a repeated task in almost any data analysis at the LHC. While the experiments record hits in tracking detectors and signals in the calorimeters, which are subsequently combined into particle-flow objects, jets, muons, electrons, missing transverse energy, or similar high-level objects, the obervables of interest are commonly related to the dynamics of the particles created in the hard collision, like top-quarks, weak bosons (W,Z), or the Higgs boson. Their reconstruction is more challenging, suffering from combinatorial ambiguities, tagging inefficiencies, acceptance losses, pile-up, or other experimental effects.
We present a new strategy for the reconstruction of hadron collider events using mini-jets as only reconstructed objects together with a machine-learning algorithm for the determination of observables of interest. These mini-jets are obtained with a distance measure of R=0.1, and reduce the full information from all particles in an event to an experimentally and computationally managable size. We show that with the help of a deep neural network observables related to intermediate W bosons or top quarks can be directly regressed, as well as particle-level jets with larger R, or dressed leptons. This ansatz outperforms classical reconstruction algorithms and paves the way for a simplified and more generic event reconstruction for future LHC analyses.
The NA61/SHINE experiment is a prominent venture in high-energy physics, located at the SPS accelerator within CERN. Recently, the experiment's physics program underwent expansion, necessitating a comprehensive overhaul of its detector configuration. This upgrade is primarily geared towards augmenting the event flow rate, elevating it from 80Hz to 1kHz. This enhancement involves a substantial alteration of the read-out electronics in the core tracking detectors of NA61/SHINE, namely the Time-Projection-Chambers (TPCs). In light of the substantial surge in collected data, the deployment of an online noise filtering tool became imperative. Traditionally, this task has relied on the reconstruction of particle tracks and the subsequent removal of clusters that lack association with any discernible particle trajectory. However, it's important to acknowledge that this method consumes a noteworthy amount of time and computational resources.
In the year 2022, the initial dataset was collected through the utilization of the upgraded detector system. In relation to this data, a collection of deep learning models was developed, employing two distinct categories of neural networks: dense and convolutional networks (DNN, CNN).
Of utmost significance is the seamless integration of these trained models into the existing NA61/SHINE C++ software framework, utilizing the capabilities of the TensorFlow C++ library. Furthermore, to facilitate easier deployment, containerization using Docker was applied. This presentation aims to unveil the results attained through the application of these algorithms for noise reduction, encompassing training times for both CNN and DNN models, post-filtering data reconstruction duration, and the Receiver Operating Characteristic (ROC) analysis of the filtered data.
The basic signal of the ATLAS calorimeters are three-dimensional clusters of topologically connected cell signals formed by following signal significance patterns. These topo-clusters provide measures of their shape, location and signal character which are employed to apply a local hadronic calibration. The corresponding multi-dimensional calibration functions are determined by training neural networks to learn the basic topo-cluster response. Selected results from this approach are compared to the standard method using look-up tables. Significant improvements are found with respect to the signal linearity and resolution.
The upcoming high-luminosity upgrade of the LHC will lead to a factor of five increase in instantaneous luminosity during proton-proton collisions. Consequently, the experiments situated around the collider ring, such as the CMS experiment, will record approximately ten times more data. Furthermore, the luminosity increase will result in significantly higher data complexity, thus making more sophisticated and efficient real-time event selection algorithms an unavoidable necessity in the future of the LHC.
One particular facet of the looming increase in data complexity is the availability of information pertaining to the individual constituents of a jet at the first stage of the event filtering system, known as the level-1 trigger. Therefore, more intricate jet identification algorithms that utilise this additional constituent information can be designed if they meet the strict latency, throughput, and resource requirements. In this work, we construct, deploy, and compare fast machine-learning algorithms, including graph- and set-based models, that exploit jet constituent data on field-programmable gate arrays (FPGAs) to perform jet classification. The latencies and resource consumption of the studied models are reported. Through quantization-aware training and efficient FPGA implementations, we show that O(100) ns inference of complex models like graph neural networks and deep sets is feasible at low resource cost.
The High Luminosity upgrade to the LHC will deliver unprecedented luminosity to the experiments, culminating in up to 200 overlapping proton-proton collisions. In order to cope with this challenge several elements of the CMS detector are being completely redesigned and rebuilt. The Level-1 Trigger is one such element; it will have a 12.5 microsecond window in which to process protons colliding at a rate of 40MHz, and reduce this down to 750kHz. The key attibute of a trigger system is to retain the signals which would benefit from further analysis, and thus should be stored on disk. This upgraded trigger, as in the present design, will utilise an all-FPGA solution. Although rules-based algorithms have traditionally been used for this purpose, the emergence of new generation FPGAs and Machine Learning toolsets have enabled neural networks to be proposed as an alternative architecture. We present the design and implementation of a Convolution Neural Network (CNN) on an FPGA to demonstrate the feasibility of such an approach. Results will be presented for a baseline signal model of a pair of Higgs bosons decaying to four b-quarks. The model architecture, resource usage, latency and implementation floorplan will all be presented. Latest results will also be shown of studies to use domain-specific knowledge to enhance the network’s inference capability.
We present the preparation, deployment, and testing of an autoencoder trained for unbiased detection of new physics signatures in the CMS experiment Global Trigger test crate FPGAs during LHC Run 3. The Global Trigger makes the final decision whether to readout or discard the data from each LHC collision, which occur at a rate of 40 MHz, within a 50 ns latency. The Neural Network makes a prediction for each event within these constraints, which can be used to select anomalous events for further analysis. The implementation occupies a small percentage of the resources of the system Virtex 7 FPGA in order to function in parallel to the existing logic. The GT test crate is a copy of the main GT system, receiving the same input data, but whose output is not used to trigger the readout of CMS, providing a platform for thorough testing of new trigger algorithms on live data, but without interrupting data taking. We describe the methodology to achieve ultra low latency anomaly detection, and present the integration of the DNN into the GT test crate, as well as the monitoring, testing, and validation of the algorithm during proton collisions.
In the search for exotic events involving displaced particles at HL-LHC, the triggering at the level-1 (L1) system will pose a significant challenge. This is particularly relevant in scenarios where low mass long-lived particles (LLPs) are coupled to a Standard Model (SM)-like 125 GeV Higgs boson and they decay into jets. The complexity arises from the low hadronic activity resulting from LLP decay, and the existing triggers’ inability to efficiently select displaced events. This study introduces a novel machine learning approach to address this challenge, utilizing a lightweight autoencoder architecture designed for low latency requirements at L1. Focusing on light LLPs with masses 10, 30 and 50 GeV with decay lengths ranging from 1 to 100 cm, this approach employs "Edge convolution" on L1 reconstructed tracks. The results show notable signal acceptance at the permissible background rate, primarily originating from minimum bias and QCD di-jet events.
In the world of particle physics experiments, we often deal with data lying in high-dimensional spaces. Tasks like navigating and comparing these data points become challenging, but can be simplified with dimensionality reduction methods. In this work, we develop a method for mapping data originating from both Standard Model processes and various theories Beyond the Standard Model into a unified representation space while conserving information about the relationship between the underlying processes. We show that such mapping techniques can be learned by a neural network and that the arrangement of processes within this representation space is stable and based on the physical properties of the processes. These results were achieved by applying neural embedding and contrastive learning to decay data by either conserving a pairwise distance or by learning similarities and differences between the signals. The resulting arrangements are easy to interpret and show interesting relationships between the data sets.
State-of-the-art (SoTA) deep learning models have achieved tremendous improvements in jet classification performance while analyzing low-level inputs, but their decision-making processes have become increasingly opaque. We introduce an analysis model (AM) that combines several phenomenologically motivated neural networks to circumvent the interpretability issue while maintaining high classification performance. Our methodology incorporates networks that scrutinize two-point energy correlations, generalizations of particle multiplicities via Minkowski functionals, and subjet momenta. Regarding top jet tagging at the hadronic calorimeter angular resolution scale, this AM performs comparably to the SoTA models (such as the ParticleTransformer and ParticleNet) in top jet tagging, at the hadronic calorimeter angular resolution scale.
Subsequently, we explore the generator systematics of top versus QCD jet classification among event samples generated from different event generators (Pythia, Vincia, and Herwig ) using both SoTA models and our AM. Both models can accurately discern differences between simulations, enabling us to adjust the systematic differences via reweighting using classifier outputs. Furthermore, AMs equipped with partial high-level inputs (AM-PIPs) can identify relevant high-level features; if critical features are omitted from the AM inputs, reweighting is affected adversely. We also visualize our correction method, focusing on important variables in top jet tagging identified by the DisCo method.
Particle jets exhibit tree-like structures through stochastic showering and hadronization. The hierarchical nature of these structures aligns naturally with hyperbolic space, a non-Euclidean geometry that captures hierarchy intrinsically. Drawing upon the foundations of geometric learning, we introduce hyperbolic transformer models tailored for tasks relevant to jet analyses, such as classification and representation learning. Through jet embeddings and jet tagging evaluations, our hyperbolic approach outperforms its Euclidean counterparts. These findings underscore the potential of using hyperbolic geometric representations in advancing jet physics analyses.
Based on: JHEP 09 (2023) 084:
Hadronization is a critical step in the simulation of high-energy particle and nuclear physics experiments. As there is no first principles understanding of this process, physically-inspired hadronization models have a large number of parameters that are fit to data. Deep generative models are a natural replacement for classical techniques, since they are more flexible and may be able to improve the overall precision. Proof of principle studies have shown how to use neural networks to emulate specific hadronization when trained using the inputs and outputs of classical methods. However, these approaches will not work with data, where we do not have a matching between observed hadrons and partons. In this paper, we develop a protocol for fitting a deep generative hadronization model in a realistic setting, where we only have access to a set of hadrons in data. Our approach uses a variation of a Generative Adversarial Network with a permutation invariant discriminator. We find that this setup is able to match the hadronization model in Herwig with multiple sets of parameters. This work represents a significant step forward in a longer term program to develop, train, and integrate machine learning-based hadronization models into parton shower Monte Carlo programs.
We introduce two novel techniques for the efficient generation of jets as low-level particle clouds. Firstly, we present EPiC-JeDi, which integrates the score-based diffusion model from PC-JeDI with the fast and computationally efficient equivariant point cloud (EPiC) layers used in the EPiC-GAN. Secondly, we introduce EPiC-FM, which shares the same architecture but employs a continuous normalizing flow approach trained using optimal transport flow matching (FM). Our models not only achieve competitive performance compared to the current state-of-the-art methods in terms of various metrics assessing the quality of generated jets but also maintain rapid generation speeds.
In this talk, we introduce a method for efficiently generating jets in the field of High Energy Physics.
Our model is designed to generate ten different types of jets, expanding the versatility of
jet generation techniques.
Beyond the kinematic features of the jet constituents, our model also excels in generating
informative features that provide insight into the types of jet constituents, such as features
which indicate if a constituent is an electron or a photon, offering a more comprehensive
understanding of the generated jets. Furthermore, our model incorporates valuable impact
parameter information, enhancing its potential utility in high energy physics research.
In particle physics, precise simulations of the interaction processes in calorimeters are essential for scientific discovery. However, accurate simulations using GEANT4 are computationally very expensive and pose a major challenge for the future of particle physics. In this study, we apply the CaloPointFlow model, a novel generative model based on normalizing flows, to fast and high-fidelity calorimeter shower generation. We use the CaloPointFlow model, an adapted version of the PointFlow model for 3D shape generation, to generate calorimeter showers using point clouds that exploit the sparsity and leverage the geometry of the data. We preprocess the voxelized datasets of the Fast Calorimeter Simulation Challenge 2022 to point clouds and apply the CaloPointFlow model to all three datasets without any adaptation. Furthermore, we evaluate the performance of our model on metrics such as energy resolution, longitudinal and transverse shower profiles, and shower shapes, and compare it with GEANT4. We demonstrate that our model can produce realistic and diverse samples with a sampling time of around 30 million single 4D points per minute. However, the model also has some limitations, such as its inability to capture the point-to-point correlation and its generation of multiple points per cell, which are in contradiction to the data. To address these issues, we propose a novel method that uses a second sampling step to compute the marginal likelihoods of each cell being hit and sample the energies accordingly. We also discuss some ideas on how to handle the point-to-point correlations in future work. The main strengths of our model are its ability to handle diverse datasets, its fast and stable convergence, and its highly efficient point production.
Building on the success of PC-JeDi we introduce PC-Droid, a substantially improved diffusion model for the generation of jet particle clouds. By leveraging a new diffusion formulation, studying more recent integration solvers, and training on all jet types simultaneously, we are able to achieve state-of-the-art performance for all types of jets across all evaluation metrics. We study the trade-off between generation speed and quality by comparing two attention based architectures, as well as the potential of consistency distillation to reduce the number of diffusion steps. Both the faster architecture and consistency models demonstrate performance surpassing many competing models, with generation time up to two orders of magnitude faster than PC-JeDi and three orders of magnitude faster than Delphes.
In High Energy Physics, detailed and time-consuming simulations are used for particle interactions with detectors. To bypass these simulations with a generative model, it needs to be able to generate large point clouds in a short time while correctly modeling complex dependencies between the particles.
For non-sparse problems on a regular grid, such a model would usually use (De-)Convolution layers to up/down-scale the number of voxels.
In this work, we present novel methods to up/down-scale point clouds. For the up-scaling, we propose the use of a feed-forward network to project each point to multiple. For the down-scaling, we propose a Message Passing Layer that connects a variable number of input points to a fixed number of trainable points.
These operations allow us to construct a Graph GAN that is able to generate such point clouds in a tree-based manner. Particle showers are inherently tree-based processes, as each particle is produced by decays or detector interaction of a particle of the previous generation. We demonstrate the model's performance on the public JetNet and CaloChallange datasets.
Simulating particle physics data is a crucial yet computationally expensive aspect of analyzing data at the LHC. Typically, in fast simulation methods, we rely on a surrogate calorimeter model to generate a set of reconstructed objects. This work demonstrates the potential to generate these reconstructed objects in a single step, effectively replacing both the calorimeter simulation and reconstruction steps. Our primary goal in this set-to-set generation is to accurately replicate the detector's resolution and the properties of the reconstructed objects.
Building on the success of our previous slot-attention-based model, we introduce two innovative approaches to improve this task and evaluate their performance using a more realistic dataset. This dataset incorporates a realistic detector simulation and a machine learning-based reconstruction algorithm.
In the first approach, we enhance the slot-attention mechanism with a state-of-the-art graph diffusion model. This entails starting with a noisy graph and progressively eliminating noise conditioned on the truth particle set, ultimately generating the reconstructed particles.
The second approach involves iterative graph refinement, directly converting the set of truth particles into the set of reconstructed objects. These approaches outperform our previous baseline in terms of both accuracy and the resolution of predicted particle properties.
Calorimeter response simulation is a critical but computationally consuming part of many physics analyses at the Large Hadron Collider. The simulation time and resource consumption can be effectively reduced by the usage of neural networks. Denosing diffusion models are emerging as the state-of-the-art for various generative tasks ranging from images to sets. We propose a new graph-based diffusion model tailored for fast calorimeter simulations, fitting naturally into non-regular detector geometries. We evaluate the model's performance using the ATLAS dataset from the Fast Calorimeter Simulation Challenge 2022, comparing to existing attempts.
Diffusion generative models are a recent type of generative models that excel in various tasks, including those in collider physics and beyond. Thanks to their stable training and flexibility, these models can easily incorporate symmetries to better represent the data they generate. In this talk, I will provide an overview of diffusion models' key features and highlight their practical applications in collider physics based on recent works, such as fast detector simulation, jet generation, direct density estimation, and anomaly detection of new physics processes.
Generative machine learning models are a promising avenue to resolve computing challenges by replacing intensive full simulations of particle detectors. We introduce CaloDiffusion, a denoising diffusion model that generates calorimeter showers, trained on the public CaloChallenge datasets. Our algorithm employs 3D cylindrical convolutions that take advantage of symmetries in the underlying data. We also introduce a new technique to handle irregular geometries called Geometry Latent Mapping or GLaM, which learns forward and reverse transformations to a regular geometry suitable for symmetry-preserving operations such as convolutions. The showers generated by our approach are nearly indistinguishable from the full simulation, as measured by several different metrics. We also report on several different approaches to speed up the generation process of diffusion models.
Given the recent success of diffusion models in image generation, we study their applicability to generating LHC phase space distributions. We find that they achieve percent level precision comparable to INNs. To further enhance the interpretability of our results we quantify our training uncertainty by developing Bayesian versions. In this talk, diffusion models are introduced and discussed followed by a presentation of our findings.
Simulating showers of particles in highly-granular detectors is a key frontier in the application of machine learning to particle physics. Achieving high accuracy and speed with generative machine learning models would enable them to augment traditional simulations and alleviate a major computing constraint.
This work achieves a major breakthrough in this task by directly generating a point cloud of a few thousand space points with energy depositions in the detector in 3D space without relying on a fixed-grid structure. This is made possible by two key innovations: i) using recent improvements in generative modeling we apply a diffusion model to generate ii) an initial even higher-resolution point cloud of up to 40,000 so-called Geant4 steps which is subsequently down-sampled to the desired number of up to 6,000 space points. We showcase the performance of this approach using the specific example of simulating photon showers in the planned electromagnetic calorimeter of the International Large Detector (ILD) and achieve overall good modeling of physically relevant distributions. We further distill the diffusion model into a consistency model and achieve a speed-up of 46x over Geant4 on a single CPU.
The simulation of particle interactions with detectors plays a central role in many high energy physics experiments. In the simulation pipeline, the most computationally expensive process is calorimeter shower generation. Looking into the future, as the size and granularity of calorimeters increase and we approach the high luminosity operational phase of the LHC, the severity of the simulation bottle neck presented by calorimeter shower generation is expected to increase. Recent developments in the field of generative modelling have led to models that are able to produce high-dimensional high-fidelity samples. When applied to calorimeter shower generation, generative models take orders of magnitude less time to produce the desired high granularity detector response. In this work we introduce a new fast surrogate model based on latent diffusion models named CaloLatent able to reproduce, with high fidelity, the detector response in a fraction of the time required by similar generative models. We evaluate the generation quality and speed using the Calorimeter Simulation Challenge 2022 dataset.
Transformers have become the primary architecture for natural language processing. In this study, we explore their use for auto-regressive density estimation in high-energy jet physics. We draw an analogy between sentences and words in natural language and jets and their constituents. Specifically, we investigate density estimation for light QCD jets and hadronically decaying boosted top jets. We exploit the generative capability of our setup to assess the quality of the density estimate. Our results indicate that the generated data samples closely resemble the original data, in particular since they are difficult to distinguish from the original data even by a powerful supervised classifier.
For Monte Carlo event generators simulating events with full inclusion of off-shell effects is a computationally very costly task. In the talk, a method making use of modern machine learning techniques is presented that enables the modelling of full off-shell effects. Using this method as a surrogate for simulations, we expect significant improvements in the feasibility of high-precision event generation.
Generative networks are promising tools in fast event generation for the LHC, yet struggle to meet the required precision when scaling up to large multiplicities. We employ the flexibility of autoregressive transformers to tackle this challenge, focusing on Z and top quark pair production with additional jets. In order to further increase precision, we use classifiers to reweight the generated distributions.
In High Energy Physics, generating physically meaningful parton configurations from a collision reconstructed within a detector is a critical step for many complex analysis tasks such as the Matrix Element Method computation and Bayesian inference on parameters of interest. This contribution introduces a novel approach that employs generative machine learning architectures, Transformers combined with Normalizing Flows, to accomplish this complex task.
Traditionally, regressing parton level quantities, such as the transverse momentum of single particles, is a common task in High Energy Physics. However, performing this task on the full event description at the parton level poses significant challenges. Furthermore, attempts to draw samples from a parton level probability density function are rare.
We propose to tackle this problem from a new perspective by using a Transformer network to analyze the full event description at the reconstruction level (including jets and leptons). This approach extracts a latent information vector, which is then used to condition a Normalizing Flow network. The Normalizing Flow learns the conditional probability at the parton level directly and is trained to generate probable sets of partons that are compatible with the observed objects. Our strategy is applicable to events with multiple jets multiplicity and can be scaled to model additional radiation at the parton level.
We will present the performance of the first version of this architecture applied to a complex final state, such as the ttH(bb) semileptonic channel. Additionally, we will discuss possible applications of the method.
The matrix element method remains a crucial tool for LHC inference in scenarios with limited event data. We enhance our neural network-based framework, now dubbed MEMeNNto, by optimizing phase-space integration techniques and introducing an acceptance function. Additionally, employing new architectures, like transformer and diffusion models, allows us to better handle complex jet combinatorics associated with initial-state radiation (ISR). These improvements are showcased again through the CP-
violating phase of the top Yukawa coupling in associated Higgs and single-top production, underlining the enhanced capabilities of our revised approach.
Theory predictions for the LHC require precise numerical phase-space integration and generation of unweighted events. We combine machine-learned multi-channel weights with a normalizing flow for importance sampling to improve classical methods for numerical integration. By integrating buffered training for potentially expensive integrands, VEGAS initialization, symmetry-aware channels, and stratified training, we elevate the performance in both efficiency and accuracy. We empirically validate these enhancements through rigorous tests on diverse LHC processes, including VBS and W+jets.
Modern machine learning is revolutionizing our understanding of big data for fundamental physics, promising to shed light on long-standing questions such as "where is the new physics" and "what is the dark matter". In this talk I will give an overview of recent, exciting developments in areas such as model-agnostic searches, fast simulation and interpretability. I will also highlight the cross-cutting nature of machine learning, illustrating how methods originally developed for the LHC are being applied in new and interesting ways to astronomy, motivated by fundamental physics.
We present a novel, data-driven analysis of Galactic dynamics, using unsupervised machine learning -- in the form of density estimation with normalizing flows -- to learn the underlying phase space distribution of 6 million nearby stars from the Gaia DR3 catalog. Solving the collisionless Boltzmann equation with the assumption of approximate equilibrium, we calculate -- for the first time ever -- a model-free, unbinned, fully 3D map of the local acceleration and mass density fields within a 3 kpc sphere around the Sun. We find clear evidence for dark matter throughout the analyzed volume. Assuming spherical symmetry and averaging mass density measurements, we find a local dark matter density of 0.47±0.05 GeV/cm3. We fit our results to a generalized NFW, and find a profile broadly consistent with other recent analyses.
Utilizing 21cm tomography provides a unique opportunity to directly investigate the astrophysical and fundamental aspects of early stages of our Universe's history, spanning the Epoch of Reionization (EoR) and Cosmic Dawn (CD). Due to the non-Gaussian nature of signals that trace this period of the Universe, methods based on summary statistics omit important information about the underlying physics. Here we demonstrate that likelihood-free inference with a BayesFlow setup consisting of a 3D CNN and a small cINN can give us the full posterior in a consistent and fast way. The chosen parameter set reflects a warm dark matter universe, where the cosmological parameters strongly influence the CD and EoR parameters.
Cosmic inflation is a process in the early Universe responsible for the generation of cosmic structures. The dynamics of the scalar field driving inflation is determined by its self-interaction potential and is coupled to the gravitational dynamics of the FLRW-background. In addition, perturbations of the inflaton field can be computed by numerical solution of the so-called mode equations. They have straightforward solutions for slowly evolving fields, but get significantly more complex in the case of realistic inflaton dynamics. Physics-informed neural networks are well able to emulate this particular dynamical system, allowing very fast predictions of fluctuation spectra for given inflationary potentials. PINNs open the possibility of reconstructing these potentials on the basis of e.g. cosmic microwave background observations. Formulating the dynamics of the complex-valued perturbations in terms of the Madelung-picture yields significant numerical advantages and allowed us to find a new constant of motion. Cosmic inference and reconstruction of potentials with associated errors require an extension to Bayesian networks, which we currently investigate.
We have developed a neural network-based pipeline for estimating galaxy cluster masses directly from X-ray photon data, using known redshift information. Our approach involves training convolutional neural networks on eROSITA simulations, with a focus on the Final Equatorial Depth Survey (eFEDS) dataset. Unlike previous methods, our approach incorporates additional cluster information, including redshift, and uses simulations that include background and point sources. This enables mass estimation for a wide range of clusters ($10^{13} M_{\odot} < M < 10^{15} M_{\odot}$) directly from observational eROSITA data. We've applied this method to eFEDS clusters and achieved consistent results with weak lensing-calibrated masses, without using weak lensing data itself. Compared to simulated data, our method shows reduced scatter in relation to luminosity and count-rate-based scaling relations.
We introduce the revamped HEPML Living Review: a more accessible website dedicated to the interplay of High-Energy Physics and Machine Learning. Featuring a new 'Recent' section and more anticipated features, we actively seek and encourage ongoing community input, envisioning this platform as a dynamic and continuously evolving exchange.
The development of techniques based on machine learning (ML) relies on the availability of datasets. Many studies are carried out within the context of particular experiments, using e.g. their simulation data. This narrows down the possibilities for collaboration as well as publication, with only limited datasets published for open access.
This gap can be bridged with the datasets produced with the Open Data Detector (ODD), a detector designed for algorithm research and development. Its goal is to create a benchmark detector with public simulation data released and available for algorithm studies. Such data can be used for all the ongoing activities in the areas such as fast simulation or reconstruction.
The tracking system of the ODD is an evolution of the detector used in the successful Tracking Machine Learning Challenge, offering a more complex and realistic design. It is complemented with the granular calorimetry and will be completed with the muon system. The magnetic field in the detector can be created with a solenoid located either in front or behind the calorimeters, providing two alternative options for detector studies.
The Calo Challenge, the first ML challenge focused on the development of the ML fast shower simulation, provided valuable feedback regarding the dataset for the ML-based fast simulation studies. Different representation of shower data is among the most important features of the ODD dataset. This should avoid bias towards the choice of the ML architecture. A wider range of particles and wider pseudorapidity coverage will present a more realistic complexity of the problem that experiments face. Ultimately, the ODD dataset users will be provided with the possibility of inserting their models inside the simulation framework, thus allowing a fair comparison of full and fast simulation in terms of accuracy as well as time and memory performance.
Particle physics is governed by a number of fundamental symmetries including Lorentz symmetry, gauge symmetries of the Standard Model, and discrete symmetries like charge, parity, and time. Consequently, designing equivariant ML architectures has emerged as a popular method for incorporating physics-inspired inductive biases into ML models. In this work, we evaluate commonly cited benefits of equivariant architectures for jet tagging and particle tracking, including model accuracy and generalizability and model/data efficiency. We conclude that many of the proposed benefits of equivariant models do not universally hold. We then discuss possible reasons this may be the case, including limited expressivity of equivariant models and symmetry breaking introduced by the experimental apparatuses used to collect physics data. We explore semi-equivariant architectures as a possible solution to address these limitations and introduce preliminary explainability studies that seek to characterize possible differences in what unconstrained, equivariant, and semi-equivariant architectures are learning in order to model the same physics task.
Improving the identification of jets initiated from gluon or quark will impact the precision of several analysis in the ATLAS collaboration physics program. Current identification algorithms (taggers) take as inputs high-level jet kinematic and substructure variables as the number of tracks associated to the jet or the jet width. We present a novel approach to tag quark- and gluon-initiated jets using jet constituents reconstructed with the ATLAS reconstruction algorithm. Using jet constituents as inputs for developing quark/gluon taggers gives the models access to a superset of information with respect to the use of high-level variables. Transformer architecture is used to learn long-range dependencies between jet constituent kinematic variables to predict the jet flavor. Several variations of Transformers are studied, and their performance is compared to the high-level variable taggers and older jet constituent taggers. We propose a new Transformer-based architecture (DeParT) that outperforms all other taggers. The models are also evaluated on events generated by multiple Monte Carlo generators to study their generalization capabilities.
Tools for discriminating quark and gluon jets are of key importance at the LHC. Methods that train directly on real data are well motivated due to both the ambiguity of parton labels and the potential for mismodelled jet substructure in Monte Carlo. This talk presents a study of weakly-supervised learning applied to Z+jet and dijet events in CMS Open Data. Using CWoLa classifiers, we investigate the quark/gluon content of the datasets under the jet topics framework. We also implement TopicFlow: a deep generative model that disentangles quark and gluon distributions from mixed datasets. We discuss the use of TopicFlow both as a generative classifier and as a way to evaluate quark/gluon tagging performance.
While the plethora of recent machine learning solutions to particle physics tasks have improved statistical power over hand-crafted methods, they often discount the importance and impact of explainability and the theoretical foundations the problems which they are used to address. This talk will present a comprehensive description of the latest version of the PELICAN network, a permutation and Lorentz-equivariant network architecture for particle physics. We demonstrate significant improvements in particle classification and four-vector regression tasks while maintaining a lightweight and uniquely explainable architecture that allows for new approaches to interpreting the network performance and results. PELICAN operates on lists of four-momenta and allows for both scalar and four-momentum outputs while respecting permutation and Lorentz symmetries. We showcase PELICAN's classification performance in the context of various hadronic final state problems: discriminating between top-quark vs. QCD backgrounds; discriminating gluon vs. light quark-induced jets; and multi-classification of gluon, light quark, $W$ boson, $Z$ boson, and top-quark jets. Further, we investigate the model dependence of the network performance and characterize PELICAN's full classification to four-vector regression pipeline behavior in the context of $W$ boson reconstruction in fully hadronic top-quark decays with QCD background including infrared and collinear-safe instances of the network.
Top-performing jet networks often compromise infrared and collinear (IRC) safety, leading to a dilemma between pursuing high experimental performance and good theoretical interpretability. In this talk, we present an innovative modification of the classic Transformer self-attention block (whose token is per-particle input) to ensure full IRC safety. By integrating this recipe into Particle Transformer (ParT), we create a version of ParT with built-in IRC safety, which has a marginal performance trade-off but outperforms all existing IRC-safe and even many IRC-unsafe networks. This method can be adapted for various jet Transformer networks which are commonly considered state-of-the-art in multiple fields, hence providing a promising solution for the experimental-theoretical dilemma.
Energy correlators, which are are correlation functions of the energy flow operator, are theoretically clean observables which can be used to improve various measurements. In this talk, we discuss ongoing work exploring the benefits of combining them with Machine Learning.
The Energy Mover's Distance (EMD) has seen use in collider physics as a metric between events and as a geometric method for defining IRC-safe observables. Recently, the spectral EMD (SEMD) has been proposed as a more analytically tractable alternative to the EMD. In this work, we obtain a closed-form expression for the $p = 2$ SEMD metric between events, removing the need to numerically solve an optimal transport problem. Additionally, we show how the SEMD can be used to define event and jet shape observables by minimizing the metric between events and parameterized energy flows (similar to the EMD), and we obtain closed-form expressions for several of these observables. We present this as part of the SPECTER framework, an efficient and highly parallelized implementation of the SEMD metric and SEMD-derived shape observables that offers a significant speedup compared to traditional optimal transport methods.
Weakly supervised methods have emerged as a powerful tool for model agnostic anomaly detection at the LHC. While these methods have shown remarkable performance on specific signatures such as di-jet resonances, their application in a more model-agnostic manner requires dealing with a larger number of potentially noisy input features. We show that neural networks struggle with noisy input features and that this issue can be solved by using boosted decision trees. Overall, boosted decision trees have a superior and more predictable performance in the weakly supervised setting than neural networks. Additionally, we significantly improve the performance by using an extended set of features.
Recent data-driven anomaly detection methods, such as CWoLA and ANODE, have shown promising results. However, they all suffer from performance degradation when irrelevant features are included. We demonstrate how these methods can be made robust even when the dataset is dominated by irrelevant features. The key idea is to employ Boosted Decision Tree (BDT)-based algorithms for signal/background discrimination and/or probability density estimation. This approach provides a natural measure of feature relevance, and can aid in constructing more interpretable models. Another advantage is that training the BDT algorithm requires significantly less computational resources than the earlier neural-network based approaches to this problem.
We employ the diffusion framework to generate background enriched templates to be used in a downstream Anomaly Detection task (generally with CWoLa). We show how Drapes can provide an analogue to many different methods of template generation, common in literature, and show good performance on the public RnD LHCO dataset.
Machine learning--based anomaly detection (AD) methods are promising tools for extending the coverage of searches for physics beyond the Standard Model (BSM). One class of AD methods that has received significant attention is resonant anomaly detection, where the BSM is assumed to be localized in at least one known variable. While there have been many methods proposed to identify such a BSM signal that make use of simulated or detected data in different ways, there has not yet been a study of the methods' complementarity. To this end, we address two questions. First, in the absence of any signal, do different methods pick the same events as signal-like? If not, then we can significantly reduce the false-positive rate by comparing different methods on the same dataset. Second, if there is a signal, are different methods fully correlated? Even if their maximum performance is the same, since we do not know how much signal is present, it may be beneficial to combine approaches. Using the Large Hadron Collider (LHC) Olympics dataset, we provide quantitative answers to these questions. We find that there are significant gains possible by combining multiple methods, which will strengthen the search program at the LHC and beyond.
Physics beyond the Standard Model that is resonant in one or more dimensions has been the subject of many anomaly detection studies. This resonant anomaly detection is well-suited for weakly supervised machine learning, where sideband information can be used to generate synthetic datasets representing the Standard Model background. One effective strategy is to learn a conditional generative model that can be interpolated into the signal region to generate synthetic samples. Until now, this approach was only able to accommodate a relatively small number of dimensions, limiting the breath of the search sensitivity. Using recent innovations in point cloud generative models, we show that this strategy can also be applied to the full phase space, using all relevant particles for the anomaly detection. As a proof of principle, we show that the signal from the R&D dataset from the LHC Olympics is findable with this method, opening up the door to future studies that explore the interplay between depth and breadth in the representation of the data for anomaly detection.
In many well-motivated models of the electroweak scale, cascade decays of new particles can result in highly boosted hadronic resonances (e.g. $Z/W/h$). This can make these models rich and promising targets for recently developed resonant anomaly detection methods powered by modern machine learning. We demonstrate this using the state-of-the-art CATHODE method applied to supersymmetry scenarios with gluino pair production. We show that CATHODE, despite being model-agnostic, is nevertheless competitive with dedicated cut-based searches, while simultaneously covering a much wider region of parameter space. The gluino events also populate the tails of the missing energy and $H_T$ distributions, making this a novel combination of resonant and tail-based anomaly detection.
The Higgs-gluon interaction is crucial for LHC phenomenology. To improve the constraints on the CP structure of this coupling, we investigate Higgs production with two jets using machine learning. In particular, we exploit the CP sensitivity of the so far neglected phase space region that differs from the typical vector boson fusion-like kinematics. Our results suggest that significant improvements in current experimental limits are possible. We also discuss the most relevant observables and how CP violation in the Higgs-gluon interaction can be disentangled from CP violation in the interaction between the Higgs boson and massive vector bosons. Assuming the absence of CP-violating Higgs interactions with coloured beyond-the-Standard-Model states, our projected limits on a CP-violating top-Yukawa coupling are stronger than more direct probes like top-associated Higgs production and limits from a global fit.
Optimal kinematic observables are often defined in specific frames and then approximated at the reconstruction level. We show how multi-dimensional unfolding methods allow us to reconstruct these observables in their proper rest frame and in a probabilistically faithful way. We illustrate our approach with a measurement of a CP-phase in the top Yukawa coupling. Our method makes use of key advantages of generative unfolding, but as a constructed observable it fits into standard LHC analysis frameworks.
High-energy collisions at the Large Hadron Collider (LHC) provide valuable insights into open questions in particle physics. However, detector effects must be corrected before measurements can be compared to certain theoretical predictions or measurements from other detectors. Methods to solve this inverse problem of mapping detector observations to theoretical quantities of the underlying collision, referred to as unfolding, are essential parts of many physics analyses at the LHC. We investigate and compare various generative deep learning methods for unfolding at parton level. We introduce a novel unified architecture, termed latent variation diffusion models, which combines the latent learning of cutting-edge generative art approaches with an end-to-end variational framework. We demonstrate the effectiveness of this approach for reconstructing global distributions of theoretical kinematic quantities, as well as for ensuring the adherence of the learned posterior distributions to known physics constraints. Our unified approach improves the reconstruction of parton-level kinematics as measured by several distribution-free metrics.
The radiation pattern within quark- and gluon-initiated jets (jet substructure) is used extensively as a precision probe of the strong force and for optimizing event generators for particle physics. Jet substructure measurements in electron-proton collisions are of particular interest as many of the complications present at hadron colliders are absent.
In this contribution, a detailed study of jet substructure observables, so-called jet angularities, are presented using data recorded by the H1 detector at HERA. The measurement is unbinned and multi-dimensional, using a novel machine learning technique to correct for detector effects. All of the available reconstructed object information inside a jet is interpreted using a graph neural network and training of these networks was performed using the Perlmutter supercomputer at Berkeley Lab. Results are reported at high transverse momentum transfer Q²>150 GeV², and the analysis is also performed in sub-regions of Q², thus probing scale dependencies of the substructure variables.
PLB 844 (2023) 138101 [arxiv:2303.13620]
Experimental data on a wide range of jet observables measured in heavy ion collisions provide a rich picture of the modification of jets as perturbative probes and of the properties of the created quark-gluon plasma. However, their interpretation is often limited by the assumptions of specific quenching models, and it remains a challenge to establish model-independent statements about the universality in different jet quenching observables.
In this work, we propose a treatment that is agnostic to the details of the jet-medium interactions and relies only on the factorization picture of QCD. Bayesian inference is used to learn the quark- and gluon-jet quenching directly from experimental data of inclusive jet observables. Evidence of the universality of jet quenching is provided by validating the learned jet energy loss through the prediction of photon-tagged jet measurements, for which the quark/gluon fraction differs from that in inclusive jets, across momenta. The extracted posterior distributions can then serve to retrieve theoretical insight in a data-driven way, and can be employed to constrain theoretical models for jet quenching.
Progress in the theoretical understanding of parton branching dynamics that occurs within an expanding QGP relies on detailed and fair comparisons with experimental data for reconstructed jets. Such validation is only meaningful when the computed object, be it analitically or via event generation, accounts for the complexity of experimentally reconstructed jets. The reconstruction of jets in heavy ion collisions involves a, necessarily imperfect, subtraction of the large and fluctuating background: reconstructed jets always include background contamination. The identification of jet quenching effects, that is modifications of the branching dynamics by interaction with QGP leading to changes on jet observables, should be done against a baseline that accounts for possible background contamination on unmodified jets. In practical terms, jet quenching effects are only those not present in samples of vacuum jets that have been embedded in a realistic heavy-ion background and where subtraction has been carried out analogously to that in the heavy ion case and as close as possible to what is done experientally. Using the extensively validated JEWEL event generator, we will present an extensive survey of the sensitivity to background effects of commonly used jet observables. Further, we will assess the robustness of Machine Learning studies aimed at classifying jets according to their degree of modification by the QGP, e.g [1], to a reference where background contamination is accounted for.
1] Miguel Crispim Romão, José Guilherme Milhano and Marco van Leeuwen, 2304.07196
Searching for non-resonant signals at the LHC is a relatively underexplored, yet challenging approach to discover new physics. These signals could arise from off-shell effects or final states with significant missing energy. This talk explores the potential of using weakly supervised anomaly detection to identify new non-resonant phenomena at the LHC. Our approach extends existing resonant anomaly detection methods from background interpolation to extrapolation. We use semi-visible jets, a type of signature predicted by dark QCD models, as a benchmark to test the sensitivity of the proposed methods.
We present improvements to model agnostic resonant anomaly detection based on normalizing flows.
To maximize the discovery potential of high-energy colliders, experimental searches should be sensitive to unforeseen new physics scenarios. This goal has motivated the use of machine learning for unsupervised anomaly detection. In this paper, we introduce a new anomaly detection strategy called FORCE: factorized observables for regressing conditional expectations. Our approach is based on the inductive bias of factorization, which is the idea that the physics governing different energy scales can be treated as approximately independent. Assuming factorization holds separately for signal and background processes, the appearance of non-trivial correlations between low- and high-energy observables is a robust indicator of new physics. Under the most restrictive form of factorization, a machine-learned model trained to identify such correlations will in fact converge to the optimal new physics classifier. We test FORCE on a benchmark anomaly detection task for the Large Hadron Collider involving collimated sprays of particles called jets. By teasing out correlations between the kinematics and substructure of jets, FORCE can reliably extract percent signal fractions. This strategy for uncovering new physics adds to the growing toolbox of anomaly detection methods for collider physics with a complementary set of assumptions
Semivisible jets are a novel signature of dark matter scenarios where the dark sector is confining and couples to the Standard Model via a portal. They consist of jets of visible hadrons intermixed with invisible stable particles that escape detection. In this work, we use normalized autoencoders to tag semivisible jets in proton-proton collisions at the CMS experiment. Unsupervised models are desirable in this context since they can be trained on background only, and are thus robust with respect to the details of the signal. The use of an autoencoder as an anomaly detection algorithm relies on the assumption that the network better reconstructs examples it was trained on than ones from a different probability distribution i.e., anomalies. Using the search for semivisible jets as a benchmark, we demonstrate the tendency of autoencoders to generalize beyond the dataset they are trained on, hindering their performance. We show how normalized autoencoders, specifically designed to suppress this effect, give a sizable boost in performance. We further propose a modified loss function and signal-agnostic condition to reach the optimal performance.
The development of precise and computationally efficient simulations is a central challenge in modern physics. With the advent of deep learning, new methods are emerging from the field of generative models. Recent applications to the generation of calorimeter images showed promising results motivating the application in astroparticle physics. In this contribution, we introduce a deep-learning-based model for the fast generation of air shower images as measured by an Imaging Air Cherenkov Telescope (IACT). Our work relies on simulations of the CT5 telescope that is part of the High Energy Stereoscopic System (H.E.S.S.) and features the FlashCam camera system with more than 1500 pixels, which will also be utilised in the Cherenkov Telescope Array (CTA), a next-generation gamma-ray observatory.
We show that our deep-learning approach, based on Wasserstein Generative Adversarial Networks, can efficiently generate gamma-ray images with good quality. Besides analysing the distributions of low-level parameters, we further examine the quality of the generated images using the Hillas parameters, a well-known parameterisation of IACT images characterising the properties and the shape of the measured Cherenkov image. The finding that our algorithm is able to reproduce the correct distributions of the low-level and the Hillas parameters, as well as their correlations, opens promising perspectives for fast and efficient simulations in gamma astronomy.
The properties of hot and/or dense nuclear matter are studied in the laboratory via Heavy-Ion Collisions (HIC) experiments. Of particular interest are the intermediate energy heavy-ion collisions that create strongly interacting matter of moderate temperatures and high densities where interesting structures in the QCD phase diagram such as a first order phase transition from a gas of hadrons to Quark Gluon Plasma or a critical endpoint are conjectured. Such densities and temperatures are also expected to be found in astrophysical phenomena such as binary neutron star mergers and supernova explosions. The experimental observables are compared with model predictions to extract the underlying properties of the matter created in such collisions. However, the model calculations are often computationally expensive and can be extremely slow. Therefore, to exploit the full potential of the upcoming HIC experiments, fast simulation methods are necessary.
In this work, we present “ParticleGrow”, a novel autoregressive point cloud generator that can simulate heavy-ion collisions on an event by event basis. Heavy-ion collision events from the microscopic UrQMD model are used to train the generative model. The model built based on the PointGrow algorithm generates the momentum ($p_x$, $p_y$ and $p_z$) and PID ( 7 different hadronic species) , particle by particle in an autoregressive fashion to construct a collision event. The distributions of the generated particles and different observables are compared with the UrQMD distributions. It is shown that the generative model can accurately reproduce different observables and effectively capture several underlying correlations in the training data.
caloutils
is a Python package built to simplify and streamline the handling, processing, and analysis of 4D point cloud data derived from calorimeter showers in high-energy physics experiments. The package includes tools to map between continuous point clouds and discrete calorimeter cells.
Furthermore, the library contains models for evaluating the performance of generative models of calorimeter showers.
As the library is fully based on point clouds, the provided tools and metrics should apply to any calorimeter and scale well with the number of cells.
Well-trained classifiers and their complete weight distributions provide us with a well motivated and practicable method to test generative networks in particle physics. I will illustrate their benefits for distribution-shifted jets, calorimeter showers, and reconstruction level events. In all cases, the classifier weights make for a powerful test of the generative network, identify potential problems in the density estimation, relate them to the underlying physics, and tie in with a comprehensive precision and uncertainty treatment for generative networks.
Due to the large computing resources spent on the detailed (full) simulation of particle transport in the HEP experiments, many efforts have been undertaken to parametrise the detector response. In particular, particle showers developing in the calorimeters are typically the most time-consuming component of simulation, hence their parameterisation is of primary focus.
Fast shower simulation has been explored by different researchers, with several machine learning (ML) models proposed on different shower datasets, including those published in the context of the Calo Challenge. Most of those models are developed and validated against the published shower datasets, without deployment in the experiments frameworks. Speed-up of those models with respect to the full simulation cannot be estimated in such conditions, especially if large batch size is used at the ML inference.
This study presents the basic aspects that create an overhead of the fast shower simulation that should be taken into account for realistic performance calculations. It is based on the Geant4 example, Par04, that was used to produce datasets 2 and 3 in the Calo Challenge. Placement of the energy deposits originating from single showers in the calorimeter is discussed, giving results for several methods, and detailing how it may differ between HEP experiments. Then a second important factor is presented: realistic ML inference batch sizes. A study of benchmark physics events has been done to determine the average number of showers in the proton-proton, as well as electron-positron collisions at future accelerators. Those two factors are important (although not the only ones) in any estimation of the ultimate speedup ML models can achieve once deployed in the experiments' frameworks.
In this talk I will present a recent strategy to perform a goodness-of-fit test via two-sample testing, powered by machine learning. This approach allows to evaluate the discrepancy between a data sample of interest and a reference sample, in an unbiased and statistically sound fashion. The model leverages the ability of classifiers to estimate the density ratio of the data-generating distributions in order to build a statistical test based on the Neyman—Pearson approach (arXiv:2305.14137). I will discuss the general framework and focus on an implementation based kernel methods which is efficient while maintaining high flexibility (arXiv:2204.02317). Initially developed to perform model-independent searches of new physics with collider data, it can be used effectively for different tasks such as online data quality monitoring (arXiv:2303.05413), and the evaluation of simulators and generative models.
I will summarize the results of the CaloChallenge, a HEP community challenge on generating calorimeter showers with deep generative models that took place in 2022/2023.
We propose a new method based on machine learning to play the devil's advocate and investigate the impact of unknown systematic effects in a quantitative way. This method proceeds by reversing the measurement process and using the physics results to interpret systematic effects under the Standard Model hypothesis.
We explore this idea with two alternative approaches, one relies on a combination of gradient descent and optimisation techniques, the other employs reinforcement learning.
We illustrate the potentiality of the presented method by considering two examples, firstly the case of a branching fraction measurement of the decay of a b-hadron, secondly the determination of the $P_{5}^{'}$ angular observable in $B^0 \to K^{*0} \mu^+ \mu^-$ decays.
We find that for the former, the size of a hypothetical hidden systematic uncertainty strongly depends on the kinematic overlap between the signal and control channels, while the latter is very robust against possible mismodellings of the efficiency.
Machine learning based jet tagging techniques have greatly enhanced the sensitivity of measurements and searches involving boosted final states at the LHC. However, differences between the Monte-Carlo simulations used for training and data lead to systematic uncertainties on tagger performance. This talk presents the performance of boosted top and W boson taggers when applied on data sets containing systematic variations that approximate some of these differences. The taggers are shown to have differing sensitivity to the systematic variations, with the most powerful taggers showing the largest sensitivity. This trend presents obstacles for the further deployment of machine learning techniques at the LHC, and an open challenge for the HEP-ML community.
Neural Networks coupled with a Monte Carlo method can be used to perform regression in the presence of incomplete information. A methodology based on this idea has been developed for the determination of parton distributions, and a closure testing methodology can be used in order to verify the reliability of the uncertainty in the results.
A relevant question in this context is what happens if the uncertainty of the input data is incorrectly estimated in the first place. We investigate this issue by a suitable adaptation of the closure testing methodology.
Deep neural network based classifiers allow for efficient estimation of likelihood ratios in high dimensional spaces. Classifier-based cuts are thus being used to process experimental data, for example in top tagging. To efficiently investigate new theory, it is essential to estimate the behavior of these cuts efficiently. We suggest circumventing the full simulation of the experimental setup and instead predict the classifier output from high-level features. The in-distribution behavior is modeled using a generative mapping while out-of-distribution areas are indicated using bayesian machine learning. We compare standard methods of bayesian deep learning, as well as a novel stochastic Markov Chain, to a baseline of full Monte Carlo sampling.
Applications of Machine Learning to physics beyond the Standard Model are becoming increasingly invaluable for theorists. As a leading proposal for a theory of quantum gravity, string theory gives rise to a plethora of 4-dimensional EFTs upon compactification, the so-called string landscape. For decades, a prohibiting factor in analysing these EFTs has been the computational cost of standard sampling methods in high dimensional model spaces. In this talk, we present recent progress in alleviating this problem by numerically constructing string vacua through a novel framework called JAXVacua (ArXiv:2306.06160). At its heart, it makes use of auto-differentiation and just-in-time compilation features of the python library JAX. We argue that this method grants access to previously unexplored regimes in the landscape of UV-complete EFTs. Beyond that, we describe how Reinforcement Learning (RL) can be employed to uncover organising principles underlying phenomenologically viable EFTs from string compactifications. Specifically, we use a multi-agent RL implementation known as SwarmRL (ArXiv:2307.00994) which is also built on the JAX ecosystem. We demonstrate ways in which RL exploits successful strategies for locating phenomenologically preferable EFTs, thereby revealing unknown structures in the string landscape.
Neural networks are a powerful tool for an ever-growing list of tasks. However, their enormous complexity often complicates developing theories describing how these networks learn. In our recent work, inspired by the development of statistical mechanics, we have studied the use of collective variables to explain how neural networks learn, specifically, the von Neumann entropy and Trace of the empirical neural tangent kernel (NTK). We show that the entropy and trace of the NTK at the start of training can indicate the diversity of the training data and even predict the quality of the model after training. Further work investigates the application of these variables to understand network dynamics better, exploring optimizers for better training and the construction of better network architectures.
This talk will be about our work on using machine learning to understand Calabi-Yau metrics. These extra-dimensional metrics determine aspects of the low-energy EFTs arising from string theory which have been unavailable for several decades prior to works using machine learning methods.
We propose a new model independent method of new physics searches called cluster scanning (CS). It utilises
k-means algorithm to perform clustering in the space of low-level event or jet observables, and separates
potentially anomalous clusters to construct the anomaly rich region from the rest that form the anomaly
poor region. The spectra of the invariant mass in these two regions are then used to determine whether
a resonant signal is present. We apply this approach in a pseudo-analysis using the LHC Olympics R&D
dataset and demonstrate the performance gains over the methods based on the global n-parameter function
fits commonly used in bump hunting. Emphasis is put on the speed and simplicity of the method
In particle physics, the search for phenomena outside the well-established predictions of the Standard Model (SM) is of great importance. For more than four decades, the SM has been the established theory of fundamental particles and their interactions. However, some aspects of nature remain elusive to the explanatory power of the SM. Thus, researchers' attention turns to the pursuit of new processes that can shed light on missing pieces of the model, potentially unveiling entirely new fundamental particles [1].
Within the context of the CERN Large Hadron Collider (LHC), most efforts to unveil new physics are directed toward specific experimental signatures. This strategy has proven exceptionally effective when hunting for preconceived, theoretically motivated particles. However, in cases where a predefined target is absent, the strength of this approach can also become its limitation. To overcome this potential hurdle, researchers engage in model-independent searches, and machine learning (ML) has emerged as the favored path for these explorations [2].
In the vast landscape of ML, Variational Autoencoders (VAEs) have emerged as a powerful tool for detecting anomalies across diverse domains. Their ability to capture the underlying data distribution and reconstruct input samples makes VAEs adept at identifying anomalies or outliers. Nonetheless, the conventional Gaussian distributions that underpin traditional VAEs may not be well-suited for the intricate nature of High Energy Physics (HEP) data. To address this challenge, we propose alternative VAE implementations, including [3]:
Multi-mode Non-Gaussian VAE (MNVAE):
This approach, previously used on complex electromechanical equipment, enhances the encoder's architecture to generate a latent variable governed by a Gaussian mixture model (GMM). The GMM is a linear combination of multiple Gaussian distributions, and it can characterize arbitrarily complex distributions if the number of Gaussian components is large enough. Subsequently, the Householder flow (HF) is employed to endow the latent variable with the full covariance matrix. [3].
Monte Carlo-based Approach:
In this method, the latent vector can assume a non-Gaussian distribution, offering a broader range of choices for the posterior distribution while ensuring a tighter Evidence Lower Bound (ELBO). This can result in VAEs capable of capturing finer details within the data distribution, thereby enhancing their generative capabilities and data reconstruction prowess [4].
Our aim is to thoroughly examine the underlying data distributions and subsequently introduce suitable modifications to the VAE framework. Creating a latent space that closely mirrors the data's shape holds the potential to enhance the VAE's ability to capture semantic content, making it better suited for anomaly detection purposes [5]. Later, hls4ml may be employed to synthesize VHDL code, enabling the implementation of the network on an FPGA.
References
[1] @misc{golling2023massive, title={The Mass-ive Issue: Anomaly Detection in Jet Physics}, author={Tobias Golling and Takuya Nobe and Dimitrios Proios and John Andrew Raine and Debajyoti Sengupta and Slava Voloshynovskiy and Jean-Francois Arguin and Julien Leissner Martin and Jacinthe Pilette and Debottam Bakshi Gupta and Amir Farbin}, year={2023}, eprint={2303.14134}, archivePrefix={arXiv}, primaryClass={hep-ph}}
[2] @ARTICLE{10.3389/fdata.2022.803685, AUTHOR={Jawahar, Pratik and Aarrestad, Thea and Chernyavskaya, Nadezda and Pierini, Maurizio and Wozniak, Kinga A. and Ngadiuba, Jennifer and Duarte, Javier and Tsan, Steven}, TITLE={Improving Variational Autoencoders for New Physics Detection at the LHC With Normalizing Flows}, JOURNAL={Frontiers in Big Data}, VOLUME={5}, YEAR={2022}, URL={https://www.frontiersin.org/articles/10.3389/fdata.2022.803685}, DOI={10.3389/fdata.2022.803685}, ISSN={2624-909X}}
[3] @article{LUO2023144, title = {Multi-mode non-Gaussian variational autoencoder network with missing sources for anomaly detection of complex electromechanical equipment}, journal = {ISA Transactions}, volume = {134}, pages = {144-158}, year = {2023}, issn = {0019-0578}, doi = {https://doi.org/10.1016/j.isatra.2022.09.009}, url = {https://www.sciencedirect.com/science/article/pii/S0019057822004669}, author = {Qinyuan Luo and Jinglong Chen and Yanyang Zi and Yuanhong Chang and Yong Feng}, keywords = {Electromechanical equipment, Fault diagnosis, Anomaly detection, Neural network, Variational autoencoder}}
[4] @misc{thin2021monte, title={Monte Carlo Variational Auto-Encoders}, author={Achille Thin and Nikita Kotelevskii and Arnaud Doucet and Alain Durmus and Eric Moulines and Maxim Panov}, year={2021}, eprint={2106.15921}, archivePrefix={arXiv}, primaryClass={stat.ML}}
[5] @Article{app12083839, AUTHOR = {Ciușdel, Costin Florian and Itu, Lucian Mihai and Cimen, Serkan and Wels, Michael and Schwemmer, Chris and Fortner, Philipp and Seitz, Sebastian and Andre, Florian and Buß, Sebastian Johannes and Sharma, Puneet and Rapaka, Saikiran}, TITLE = {Normalizing Flows for Out-of-Distribution Detection: Application to Coronary Artery Segmentation}, JOURNAL = {Applied Sciences}, VOLUME = {12}, YEAR = {2022}, NUMBER = {8}, ARTICLE-NUMBER = {3839}, URL = {https://www.mdpi.com/2076-3417/12/8/3839}, ISSN = {2076-3417}, DOI = {10.3390/app12083839}}
Exploring innovative methods and emerging technologies holds the promise of enhancing the capabilities of LHC experiments and contributing to scientific discoveries. In this work, we propose a new strategy for anomaly detection at the LHC based on unsupervised quantum machine learning algorithms. To accommodate the constraints on the problem size dictated by the limitations of current quantum hardware we develop a classical autoencoder. The designed quantum models, an unsupervised kernel machine and two clustering algorithms, are trained to detect new-physics events in the latent representation of LHC data generated by the autoencoder. The performance of the quantum algorithms is assessed on different new-physics scenarios and its dependence on the dimensionality of the latent space and the size of the training dataset is studied. For kernel-based anomaly detection, we identify a regime where the quantum model significantly outperforms its classical counterpart. An instance of the kernel machine is implemented on a quantum computer to verify its suitability for available hardware. We demonstrate that the observed consistent performance advantage is related to the inherent quantum properties of the circuit used.
Binary discrimination between well-defined signal and background datasets is a problem of fundamental importance in particle physics. In this talk, I present a first theoretical study of binary discrimination when the likelihood ratio is infrared and collinear safe, and derive expressions necessary for prediction of the ROC curve at next-to-leading order in the strong coupling. As an example of this framework, I apply it to H -> bb versus g -> bb discrimination and demonstrate that the description through NLO is required for qualitative understanding of corresponding results from machine learning studies.
Neural Networks (NN), the backbones of Deep Learning, create field theories through their output ensembles at initialization. Certain limits of NN architecture give rise to free field theories via Central Limit Theorem (CLT), whereas other regimes give rise to weakly coupled, and non-perturbative field theories, via small, and large deviations from CLT. I will present a systematic construction of free, weakly interacting, and non-perturbative field theories by tuning different attributes of NN architectures, bringing in methods from statistical physics, and a new set of Feynman rules. Some interacting field theories of our choice can be exactly engineered at initialization, by parametrically deforming distributions of stochastic variables in NN architectures. As an example, I will present the construction of $λφ^4$ scalar field theory via statistical independence breaking of NN parameters in the infinite width limit.
Recognizing symmetries in data allows for significant boosts in neural network training. In many cases, however, the underlying symmetry is present only in an idealized dataset, and is broken in the training data, due to effects such as arbitrary and/or non-uniform detector bin edges. Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry. We introduce a novel data-augmentation scheme, which augments the training set with transformed pre-detector examples which respect the true underlying symmetry and avoid artifacts. In addition, we encourage the network to treat the augmented copies identically, allowing it to learn the broken symmetry. While the technique can be extended to other symmetries, we demonstrate its application on rotational symmetry in particle physics calorimeter images. We find that standard neural networks converge to a solution more quickly than networks trained without data augmentation, and that networks modified to encourage similar internal treatment of augmentations of the same input converge even faster.
We have developed an end-to-end data analysis framework, HEP ML Lab (HML), based on Python for signal-background analysis in high-energy physics research. It offers essential interfaces and shortcuts for event generation, dataset creation, and method application.
With the HML API, a large volume of collision events can be generated in sequence under different settings. The representations module enables easy conversion of event data into input formats required by various methodologies. The API also includes three categories of analysis methods: cut-based analysis, multivariate analysis, and neural networks, to cater to diverse needs. Coupled with built-in metric parameters, users can preliminarily assess the performance of different analytical methods while using them.
While the high-energy physics research community has already explored several frameworks that integrate data and analysis methods, we advocate for integrating the entire end-to-end process into a single framework. By offering a unified style of programming interface, it reduces the need for researchers to switch between different software and frameworks. This not only simplifies and clarifies the research process, but also facilitates the reproduction of previous research results, leading to more persuasive conclusions.
To demonstrate the convenience and effectiveness of HML, we provide a case study that differentiates between Z jets and QCD jets. We provide benchmark testing for the three built-in methods and ultimately export shareable datasets and model checkpoints.
We present a class of Neural Networks which extends the notion of Energy Flow Networks (EFNs) to higher-order particle correlations. The structure of these networks is inspired by the Energy-Energy Correlators of QFT, which are particularly robust against non-perturbative corrections. By studying the response of our models to the presence and absence of non-perturbative hadronization, we can identify and design networks which are insensitive to the simulated hadronization model, while still optimized for a given performance objective. Moreover, the trained models can give surprising insights into the physics of the problem, for example by spontaneously learning to identify relevant energy scales. We demonstrate our method by training an effective tagger for boosted bosons which shows comparable performance to state of the art methods but minimal sensitivity to theory systematics, which are notoriously difficult for experimentalists to quantify.
A measurement of novel event shapes quantifying the isotropy of collider events is presented, made using 140 fb$^{−1}$ of proton-proton collisions with $\sqrt{s}$=13 TeV centre-of-mass energy recorded with the ATLAS detector at CERN's Large Hadron Collider. These event shapes are defined as the Energy-Mover's Distance between collider events and isotropic reference geometries, evaluated by solving optimal transport problems. Isotropic references with cylindrical and circular symmetries are studied, to probe the symmetries of interest at hadron colliders. The novel event-shape observables defined in this way are infrared- and collinear-safe, have improved dynamic range and have greater sensitivity to isotropic radiation patterns than other event shapes.
In this talk, we present the ATLAS measurement and some additional variations, applications and interpretations of the event isotropy. We explore how the the observable can be altered, e.g. by varying the distance metric, the reference topology, the underlying geometry, etc., to be more or less sensitive to features of the event. With these studies, one can define event shapes that improve their discrimination power in future searches for rare SM processes or BSM phenomena.
As the performance of the Large Hadron Collider (LHC) continues to improve in terms of energy reach and instantaneously luminosity, ATLAS faces an increasingly challenging environment. High energy proton-proton ($pp$) interactions, known as hard scatters, are produced in contrast to low energy inelastic proton-proton collisions referred to as pile-up. From the perspective of data analyses, hard scatter events are processes of interest that probe the quantum scale, whilst pile-up is conceptually no different from noise and so is often removed from reconstructed objects (e.g. jets) during a measurement. As the High Luminosity LHC (HL-LHC) era approaches, current simulations of pile-up are inadequate at addressing the environment presented by an estimated 200 pile-up interactions per bunch crossing. This poses a significant problem for precision measurements at the HL-LHC.
In order to address this, Deep Generative models for fast and precise physics Simulations (DeGeSim) endeavours to utilise deep generative image synthesis techniques to emulate calorimeter images of soft quantum chromodynamic (QCD) pile-up data collected by ATLAS at the LHC. The project ultimately uses Denoising Diffusion Probabilistic Models (DDPMs) to synthesize calorimeter images based on instances of real (observed) pile-up data collected by the ATLAS detector. However, instead of seeding the generation from gaussian noise, MC simulated images of pile-up are used. This is achieved by harnessing the intrinsic markov chain process of diffusion models to map MC images to data images, allowing for semantic based image alteration. The intention is to replace MC generated calorimeter images with data informed edited versions of the image within the ATLAS simulation chain, thereby yielding images that better resemble data.
The work that will be presented is a sub-component of the aforementioned model, which addresses a key problem in probability density mapping techniques, such as density ratio estimation, of disjoint probability density functions in which the state spaces lack support. Specifically, we demonstrate that a conditional denoising diffusion probabilistic model (DDPM) augmented with self-conditioning can be used to map between otherwise disjoint pdfs. This is achieved by utilising the conditional behaviour of DDPMs to solve a pseudo-inverse problem of generating a pdf with parameter set $\vec{\theta}^{\prime}$ from an initial data point obtained by sampling the sample space of a different disjoint pdf with parameter set $\vec{\theta}$. This is achieved via the use of density ratio estimators and classifier free guidance. In addition, the proposed model utilises analog bit representations of discrete state spaces to solve the instabilities introduced when dealing with datasets that occupy both continuous and discrete state spaces, as is common in High Energy Particle physics problems.
Jet formation algorithms that utilise eigenvalues of the similarity matrix offer a innovative take on the definition of a jet. This is referred to as spectral clustering. It solves the clustering problem in a non-greedy manner, and so may find more optimal solutions that straightforward agglomerative algorithms. However, the eigenvalue problem is computationally expensive, so in this study the Chebychev Polynomial approximation to the eigenvalue spectrum is applied.
This talk will motivate our interest in spectral clustering for jet formation, and describe the advantages we expect. Some toy datasets that demonstrate this edge are presented.
The time complexity of the algorithm is then discussed, and the Chebychev Polynomial approximation is introduced. Finally, an prototype of the altered algorithm is presented. We share our preliminary results that maintain good performance with a significant reduction in time complexity.
On average, during Run 2 of the Large Hadron Collider (LHC), 30-50 simultaneous vertices yielding charged and neutral showers, otherwise known as pileup, were recorded per event. This number is expected to only increase at the High Luminosity LHC with predicted values as high as 200. As such, pileup presents a salient problem that, if not checked, hinders the search for new physics as well as Standard Model precision measurements such as jet energy, jet substructure, missing momentum, and lepton isolation. The existing state-of-the-art pileup mitigation strategies seek to label pileup on a constituent particle basis. One such methodology is the foundation for this work known as Training Optimal Transport using Attention Learning (TOTAL, arXiv:2211.02029). The TOTAL methodology relies on the use of a transformer architecture using a loss function inspired by optimal transport problems to learn full event topologies. By comparing matched events with and without pileup added, the TOTAL network robustly learns pileup as a transport function, which can be used to reject pileup constituents. In this work, we improve upon the existing TOTAL methodology by reducing its necessary supervision. By no longer requiring the events with and without pileup to be directly matched, we can work in a weakly-supervised context comparing real data events with high and low pileup. Despite the reduced supervision, our work still outperforms existing conventional pileup mitigation approaches. Such an extension of the TOTAL methodology would allow for more robust pileup mitigation, less reliant on simulations, as well as the possibility of online pileup mitigation.
We present a model-agnostic search for new physics in the dijet final state using five different novel machine-learning techniques. Other than the requirement of a narrow dijet resonance, minimal additional assumptions are placed on the signal hypothesis. Signal regions are obtained utilizing multivariate machine learning methods to select jets with anomalous substructure. A collection of complimentary methodologies -- based on unsupervised, weakly-supervised and semi-supervised paradigms -- are used in order to maximize the sensitivity to unknown New Physics signatures.