High-energy collisions at the Large Hadron Collider (LHC) provide valuable insights into open questions in particle physics. However, detector effects must be corrected before measurements can be compared to certain theoretical predictions or measurements from other detectors. Methods to solve this inverse problem of mapping detector observations to theoretical quantities of the underlying...
Building on the success of PC-JeDi we introduce PC-Droid, a substantially improved diffusion model for the generation of jet particle clouds. By leveraging a new diffusion formulation, studying more recent integration solvers, and training on all jet types simultaneously, we are able to achieve state-of-the-art performance for all types of jets across all evaluation metrics. We study the...
We employ the diffusion framework to generate background enriched templates to be used in a downstream Anomaly Detection task (generally with CWoLa). We show how Drapes encompasses all modes of template generation, common in literature, and show State-of-the-art performance on the public RnD LHCO dataset.
Self-supervised learning (SSL) is a technique to obtain descriptive representations of data in a pretext task based on unlabeled input. Despite being well established in fields such as natural language processing and computer vision, SSL applications in high energy physics (HEP) have only just begun to be explored. Further research into SSL in the context of HEP is especially motivated given...
Accurate reconstruction of particles from detector data forms the core problem in experimental particle physics. The spatial resolution of the detector, in particular the calorimeter granularity, is both influential in determining the quality of the reconstruction, and largely sets the upper limit for the algorithm's theoretical capabilities. To address these limitations, super-resolution...
Extracting optimal information from upcoming cosmological surveys is a pressing task, for which a promising path to success is performing field-level inference with differentiable forward modeling. A key computational challenge in this approach is that it requires sampling a high-dimensional parameter space. In this talk I will present a new promising method to sample such large parameter...
The simulation of particle physics data is a fundamental but computationally intensive ingredient for physics analysis at the Large Hadron Collider. In traditional fast simulation schemes, a surrogate calorimeter model is the basis for a set of reconstructed particles. We demonstrate the feasibility of generating the reconstructed objects in one step, replacing both the calorimeter simulation...
The matrix element method is the LHC inference method of choice for limited statistics. We present a dedicated machine learning framework, based on efficient phase-space integration, a learned acceptance and transfer function. It is based on a choice of INN and diffusion networks, and a transformer to solve jet combinatorics. We showcase this setup for the CP-phase of the top Yukawa coupling...
Graph Neural Networks are the premier method for learning the physics of a given system, since abstracting physical systems as graphs fits naturally with common descriptions of those systems. I will show how the fundamental processes that shape galaxies and dark matter halos can be learned efficiently by embedding galaxies and halos on either temporal or spatial graphs. Learning the temporal...
Self-Supervised Learning (SSL) is at the core of training modern large ML models, providing a scheme for learning powerful representations in base models that can be used in a variety of downstream tasks. However, SSL training strategies must be adapted to the type of training data, thus driving the question: what are powerful SSL strategies for collider physics data? In the talk, we present a...
Novel decorrelation method using Convex Neural Optimal Transport Solvers (Cnots) that is able to decorrelate a continuous feature space against protected attributes with optimal transport. We demonstrate how well it performs in the context of jet classification in high energy physics, where classifier scores are desired to be decorrelated from the mass of a jet.
Machine learning--based anomaly detection (AD) methods are promising tools for extending the coverage of searches for physics beyond the Standard Model (BSM). One class of AD methods that has received significant attention is resonant anomaly detection, where the BSM is assumed to be localized in at least one known variable. While there have been many methods proposed to identify such a BSM...
We propose a new model independent method of new physics searches called cluster scanning (CS). It utilises k-means algorithm to perform clustering in the space of low-level event or jet observables, and separates potentially anomalous clusters to construct the anomaly rich region from the rest that form the anomaly poor region. The spectra of the invariant mass in these two regions are then...
Image-to-image translation is a important problem across various fields, including Cosmology and Astrophysics. The image-to-image translation can facilitate the unraveling the mysteries of the universe. While many empirical approaches have been proposed to address this problem, they often lack a solid theoretical basis that could generalize them.
In this work, we explore the image-to-image...
In High Energy Physics, detailed and time-consuming simulations are used for particle interactions with detectors. To bypass these simulations with a generative model, it needs to be able to generate large point clouds in a short time while correctly modeling complex dependencies between the particles.
For non-sparse problems on a regular grid, such a model would usually use (De-)Convolution...
The quest for primordial B-modes in cosmic microwave background (CMB) observations requires a refined model of the Galactic dust foreground. We investigate diffusion-based models of the dust foreground and their interest for both component separation and cosmological inference. First, under the assumption of a Gaussian CMB with known cosmology, we show that diffusion models can be trained on...
In this work, we investigate state-of-the-art deep-learning techniques for domain transfer applied to astrophysical images of simulated galaxies. Our main objective is to infer astrophysical properties, including galactic dark matter distribution, from observational data, such as radio interferometry from the upcoming Square-Kilometer Array (SKA) Observatory.
To achieve this, we leverage...
Detecting strong lenses in a large dataset such as Euclid is very challenging due to the unbalanced nature of dataset. Existing CNN models are producing large amount of false positives, for example one strong lens candidate will be accompanied by 100's of false positives in the final sample. To over come this challenge, we have developed a novel ML pipeline called DenseLens, which consists of...
The damping wing signature of high-redshift quasars in the intergalactic medium (IGM) provides a unique way of probing the history of reionization. Next-generation surveys will collect a multitude of spectra that call for powerful statistical methods to constrain the underlying astrophysical parameters such as the global IGM neutral fraction as tightly as possible. Inferring these parameters...
One common issue in both research and industry is the growing data volumes and thereby the ever-increasing need for more data storage. With experiments taking more complex data at higher rates, the data recorded is quickly outgrowing the storage capabilities [1]. Since the data formats used are already highly compressed, storage constraints would require more drastic measures such as more...
The generation of collider data using machine learning has emerged as a prominent research topic in particle physics due to the increasing computational challenges associated with traditional Monte Carlo simulation methods, particularly for future colliders with higher luminosity. The representation of collider data as particle clouds brings favourable benefits. The underlying physics provides...
Reconstructing accurate sky models from dirty radio images is crucial for advancing high-redshift galaxy evolution studies, especially when using ALMA. Existing pipelines often employ CLEAN algorithms followed by source detection methods. However, these pipelines struggle in low-SNR scenarios and cannot directly apply to idealized, noise-free sky models.
We present a novel framework that...
I will present a novel methodology to address many ill-posed inverse problems, by providing a description of the posterior distribution, which enables us to get point estimate solutions and to quantify their associated uncertainties. Our approach combines Neural Score Matching and a novel posterior sampling method based on an annealed HMC algorithm to sample the full high-dimensional posterior...
The simulation of calorimeter showers is computationally intensive, leading to the development of generative models as substitutes. We propose a framework for designing generative models for calorimeter showers that combines the strengths of voxel and point cloud approaches to improve both accuracy and computational efficiency. Our approach employs a pyramid-shaped design, where the base of...
The injection of physics principles for training machine learning algorithms is an active area of research and development within the particle physics community. In this contribution we present a novel methodology, based on differentiable programming tools, for pattern recognition and track fitting in muon chambers with high noise rates.
The developed architecture centers around...
In this study, we present a novel information-theoretic framework, termed as TURBO, designed to systematically analyse and generalise auto-encoding methods. We examine the principles of information bottleneck and bottleneck-based networks in the auto-encoding setting and identify their inherent limitations, which become more prominent for data with multiple relevant, physics-related...
Optimizing observational astronomy campaigns is becoming a complex and expensive task for next-generation telescopes, where manual planning of observations may tend to reach suboptimal results in terms of optimization.
Reinforcement Learning (RL) has been well-demonstrated as a valuable approach for training autonomous systems, and it may provide the basis for self-driving telescopes capable...
Unsupervised and weakly supervised techniques in machine learning can boost conventional methods for anomaly detection in HEP and open up a path for model-agnostic searches. Challenges posed by HEP data, including its voluminous nature and intricate structure, as well as insights drawn from the studies of manifold models are adressed.
The properties of the hydrogen atom have played a central role in fundamental physics for the past 200 years. The CPT theorem, a cornerstone of the standard model, requires that hydrogen and antihydrogen ($\bar{H}$) have the same properties. The ALPHA antihydrogen experiment attempts to test this theory by measuring the fundamental properties of antihydrogen. We have previously measured the...
The need for greater flexibility, faster turnaround times, reduced energy consumption, reducing operational cost at maximum physics output and the sheer size of potential future accelerators such as the FCC ask for new particle accelerator operational models with automation at the center. AI/ML is already playing a significant role in the accelerator domain with numerous applications in...
In high-energy physics experiments, tracking, the reconstruction of particle trajectories from hits in the inner detector, is a computationally intensive task due to the large combinatorics of detector signals. Recent efforts have proven that ML techniques can be successfully applied to the tracking problem, extending and improving the conventional methods based on feature engineering....
The Bert pretraining paradigm has proven to be highly effective in many domains including natural language processing, image processing and biology. To apply the Bert paradigm the data needs to be described as a set of tokens, and each token needs to be labelled. To date the Bert paradigm has not been explored in the context of HEP. The samples that form the data used in HEP can be described...
We pursue the use of Transformers to compute scattering amplitudes in planar N = 4 super-Yang-Mills theory, a quantum field theory closely related to Quantum Chromodynamics (QCD). By expanding multiple polylogarithm functions in the Feynman integrals using the symbol map, we formulate scattering amplitudes in a language-based representation that is amenable to Transformer architectures and...