As detector technologies improve, the increase in resolution, number of channels and overall size create immense bandwidth challenges for the data acquisition system, long data center compute times and growing data storage costs. Much of the raw data does not contain useful information and can be significantly reduced with veto and compression systems as well as online analysis.
We design...
Particle flow reconstruction is crucial to analyses performed at general-purpose detectors, such as ATLAS and CMS. Recent developments have shown that a machine-learned particle-flow reconstruction using graph neural networks offer a prospect for computationally efficient event reconstruction [1-2]. Focusing on scalability of machine-learning based models for full event reconstruction, we...
The High-Luminosity LHC (HL-LHC) will provide an order of magnitude increase in integrated luminosity and enhance the discovery reach for new phenomena. The increased pile-up foreseen during the HL-LHC necessitates major upgrades to the ATLAS detector and trigger. The Phase-II trigger will consist of two levels, a hardware-based Level-0 trigger and an Event Filter (EF) with tracking...
The combinatorics of track seeding has long been a computational bottleneck for triggering and offline computing in High Energy Physics (HEP), and remain so for the HL-LHC. Next-generation pixel sensors will be sufficiently fine-grained to the point of being able to determine angular information of the charged particle passing through. This detector technology immediately improves the...
Computing demands for large scientific experiments, such as the CMS experiment at CERN, will increase dramatically in the next decades. To complement the future performance increases of software running on CPUs, explorations of coprocessor usage in data processing hold great potential and interest. We explore the novel approach of Services for Optimized Network Inference on Coprocessors...
Due to the stochastic nature of hadronic interactions, particle showers from hadrons can vary greatly in their size and shape. Recovering all energy deposits from a hadronic shower within a calorimeter into a single cluster can be challenging and requires an algorithm that accommodates the large variation present in such showers. In this study, we demonstrate the potential of a deep learning...
In 2026 the Phase-II Upgrade will enhance the LHC to become the High Luminosity LHC. Its luminosity will be up to 7 times of the nominal LHC luminosity. This leads to an increase in interesting events which might open the door to detect new physics. However, it also leads to a major increase in proton-proton collisions with mostly low energetic hadronic particles, called pile-up. Up to 200...
A novel data collection system, known as Level-1 (L1) Scouting, is being introduced as part of the L1 trigger of the CMS experiment at the CERN LHC. The L1 trigger of CMS, implemented in FPGA-based hardware, selects events at 100 kHz for full read-out, within a short 3 microsecond latency window. The L1 Scouting system collects and stores the reconstructed particle primitives and intermediate...
Decision Forests are fast and effective machine learning models for making real time predictions. In the context of the hardware triggers of the experiments at the Large Hadron Collider, DF inference is deployed on FPGA processors with sub-microsecond latency requirements. The FPGAs may be executing many algorithms, and many DFs, motivating resource-constrained inference. Using a jet tagging...
We introduce the fwXmachina framework for evaluating boosted decision trees on FPGA for implementation in real-time systems. The software and electrical engineering designs are introduced, with both physics and firmware performance detailed. The test bench setup is described. We present an example problem in which fwXmachina may be used to improve the identification of vector boson fusion...
We present the preparation, deployment, and testing of an autoencoder trained for unbiased detection of new physics signatures in the CMS experiment Global Trigger test crate FPGAs during LHC Run 3. The Global Trigger makes the final decision whether to readout or discard the data from each LHC collision, which occur at a rate of 40 MHz, within a 50 ns latency. The Neural Network makes a...
We describe an application of the deep decision trees, described in fwXmachina part 1 and 2 at this conference, in fwXmachina for anomaly detection in FPGA for implementation in real-time systems. A novel method to train the decision-tree-based autoencoder is presented. We give an example in which fwXmachina may be used to detect a variety of different BSM models via anomaly detection at the...
In the next years the ATLAS experiment will undertake major upgrades to cope with the expected increase of luminosity provided by the Phase II of the LHC accelerator. In particular, in the barrel of the muon spectrometer a new triplet of RPC detector will be added and the trigger logic will be performed on FPGAs. We have implemented a new CNN architecture that is able to identify the muon...
This work describes the investigation of neuromorphic computing--based spiking neural network (SNN) models used to filter data from sensor electronics in the CMS experiments experiments conducted at the High Luminosity Large Hadron Collider (HL-LHC). We present our approach for developing a compact neuromorphic model that filters out the sensor data based on the particle's transverse momentum...
The processing of large volumes of high precision data generated by sophisticated detectors in high-rate collisions poses a significant challenge for major high-energy nuclear and particle experiments. To address this challenge and revolutionize real-time data processing pipelines, modern deep neural network techniques and AI-centric hardware innovations are being developed.
The sPHENYX...
The Large Hadron Collider will be upgraded to the High Luminosity LHC, delivering many more simultaneous proton-proton collisions, extending the sensitivity to rare processes. The CMS detector will be upgraded with new, highly granular, detectors in order to maintain performance in the busy environment with many overlapping collisions (pileup). For the first time, tracks from charged particles...
The future LHC High-Luminosity upgrade amplifies the proton collision rate by a factor of about 5-7, posing challenges for physics object reconstruction and identification including tau and b-jet tagging. Detecting both the taus and bottom quarks at the CMS Level-1 (L1) trigger enhances many important physics analyses in the experiment. The challenge of the L1 trigger system requires...
Data storage is a major limitation at the Large Hadron Collider and is currently addressed by discarding a large fraction of data. We present an autoencoder based lossy compression algorithm as a first step towards a solution to mitigate this problem, potentially enabling storage of more events. We deploy an autoencoder model, on Field Programmable Gate Array (FPGA) firmware using the hls4ml...
With machine learning gaining more and more popularity as a physics analysis tool, physics computing centers, such as the Fermilab LHC Physics Center (LPC), are seeing huge increases in their resources being used for such algorithms. These facilities, however, are not generally set up efficiently for machine learning inference as they rely on slower CPU evaluation, which has a noticeable...
The upcoming high-luminosity upgrade of the LHC will lead to a factor of five increase in instantaneous luminosity during proton-proton collisions. Consequently, the experiments situated around the collider ring, such as the CMS experiment, will record approximately ten times more data. Furthermore, the luminosity increase will result in significantly higher data complexity, thus making more...
The challenging environment of real-time systems at the Large Hadron Collider (LHC) strictly limits the computational complexity of algorithms that can be deployed. For deep learning models, this implies only smaller models that have lower capacity and weaker inductive bias are feasible. To address this issue, we utilize knowledge distillation to leverage both the performance of large models...
The exceptional challenges in data acquisition faced by experiments at the LHC demand extremely robust trigger systems. The ATLAS trigger, after a fast hardware data processing step, uses software-based selections referred to as the High-Level-Trigger (HLT). Jets originating from b-quarks (b-jets) are produced in many interesting fundamental interactions, making them a key signature in a broad...
BDTs are simple yet powerful ML algorithms with performance often at par with cutting-edge NN-based models. The structure of BDTs allows for a highly parallelized, low-latency implementation in FPGAs. I will describe the development and implementation of a BDT-based algorithm for tau lepton identification in the ATLAS Level-1 trigger system as part of the phase-I upgrade, designed to be...
The High Luminosity upgrade to the LHC will deliver unprecedented luminosity to the experiments, culminating in up to 200 overlapping proton-proton collisions. In order to cope with this challenge several elements of the CMS detector are being completely redesigned and rebuilt. The Level-1 Trigger is one such element; it will have a 12.5 microsecond window in which to process protons colliding...
Extracting low-energy signals from LArTPC detectors is useful, for example, for detecting supernova events or calibrating the energy scale with argon-39. However, it is difficult to efficiently extract the signals because of noise. We propose using a 1DCNN to select wire traces that have a signal. This efficiently suppresses the background while still being efficient for the signal. This is...
Graph structures are a natural representation of data in many fields of research, including particle and nuclear physics experiments, and graph neural networks (GNNs) are a popular approach to extract information from that. Simultaneously, there is often a need for very low-latency evaluation of GNNs on FPGAs. The HLS4ML framework for translating machine learning models from industry-standard...
Within the framework of the L1 trigger's data filtering mechanism, ultra-fast autoencoders are instrumental in capturing new physics anomalies. Given the immense influx of data at the LHC, these networks must operate in real-time, making rapid decisions to sift through vast volumes of data. Meeting this demand for speed without sacrificing accuracy becomes essential, especially when...
Recent years have witnessed the enormous success of the transformer models in various research fields including Natural Language Processing, Computational Vision as well as natural science territory. In the HEP community, models with transformer backbones have shown their power in jet tagging tasks. However, despite the impressive performance, transformer-based models are often large and...
The field of Astrodynamics faces a significant challenge due to the increasing number of space objects orbiting Earth, especially from recent satellite constellation deployments. This surge underscores the need for quicker and more efficient algorithms for orbit propagation and determination to mitigate collision risks in both Earth-bound and interplanetary missions on large scales. Often,...
Gamma-ray bursts (GRBs) have traditionally been categorized based on their durations. However, the emergence of extended emission (EE) GRBs, characterized by durations higher than two seconds and properties similar to short GRBs, challenges conventional classification methods. In this talk, we delve into GRB classification, focusing on a machine-learning technique (t-distributed stochastic...
Deep Learning assisted Anomaly detection is quickly becoming a powerful tool allowing for the rapid identification of new phenomena.
We present a method of anomaly detection techniques based on deep recurrent autoencoders to the problem of detecting gravitational wave signals in laser interferometers. This class of algorithm is trained via a semi-supervised strategy, i.e. with a weak...
Deep Learning (DL) applications for gravitational-wave (GW) physics are becoming increasingly common without the infrastructure to be validated at-scale or deployed in real-time. With ever more sensitive GW observing runs beginning in 2023, the tradeoff between speed and data robustness must be bridged in order to create experimental pipelines which take shorter to iterate upon and which...
The Deep Underground Neutrino Experiment (DUNE) presents promising approaches to better identify and understand supernova (SN) events. Using simulated Liquid Argon Time Projection Chamber (LarTPC) data, we develop an end to end edge-AI pipeline that has the potential to significantly reduce SN pointing time. Using a sequence of machine learning algorithms, we are able to reject radiological...
In the Fermilab accelerator complex, the Main Injector (MI) and the Recycler Ring (RR) share a tunnel. The initial design was made for the needs of the Tevatron, where the RR stored fairly low intensities of anti-protons. Currently, however, both the MI and RR often have high intensity beams at the same time. Beam loss monitors (BLMs) are placed at different points in the tunnel to detect...
Superconducting (SC) magnets deployed at any accelerator complex must reach exceptionally high currents to accurately control particle trajectories. During operation, superconducting magnets occasionally experience a spontaneous transition from the superconducting to the normal state while operating at several kiloamps (quenching). Quenches may significantly damage the magnet, preventing SC...
The Tokamak magnetic confinement fusion device is one leading concept design for future fusion reactors which require extremely careful control of plasma parameters and magnetic fields to prevent fatal instabilities. Magneto-hydrodynamic (MHD) instabilities occur when plasma confinement becomes unstable as a result of distorted non-axisymmetric magnetic field lines. These ``mode''...
Segmentation is the assigning of a semantic class to every pixel in an image, and is a prerequisite for downstream analysis like phase quantifcation, morphological characterization etc. The wide range of length scales, imaging techniques and materials studied in materials science means any segmentation algorithm must generalise to unseen data and support abstract, user-defined semantic...
Increased development and utilization of multimodal scanning probe microscopy (SPM) and spectroscopy techniques have led to an orders-of-magnitude increase in the volume, velocity, and variety of collected data. While larger datasets have certain advantages, practical challenges arise from their increased complexity including the extraction and analysis of actionable scientific information. In...
Materials have marked human evolution throughout history. The next technological advancement will inevitably be based on a groundbreaking material. Future discovery and application of materials in technology necessitates precise methods capable of creating long-range, non-equilibrium structures with atomic accuracy. To achieve this, we need enhanced analysis tools and swift automated...
Accurate and reliable long-term operational forecasting is of paramount importance in numerous domains, including weather prediction, environmental monitoring, early warning of hazards, and decision-making processes. Spatiotemporal forecasting involves generating temporal forecasts for system state variables across spatial regions. Data-driven methods such as Convolutional Long Short-Term...
Surgical data technologies have not only been successfully integrated inputs from various data sources (e.g., medical devices, trackers, robots and cameras) but have also applied a range of machine learning and deep learning methods (e.g., classification, segmentation or synthesis) to data-driven interventional healthcare. However, the diversity of data, acquisitions and pre-processing...
The use of neural networks for approximating fermionic wave functions has become popular over the past few years as their ability to provide impressively accurate descriptions of molecules, nuclei, and solids has become clear.
Most electronic structure methods rely on uncontrolled approximations, such as the choice of exchange-correlation functional in density functional theory or the form...
High-dimensionality is known to be the bottleneck for both nonparametric regression and Delaunay triangulation. To efficiently exploit the geometric information for nonparametric regression without conducting the Delaunay triangulation for the entire feature space, we develop the crystallization search for the neighbour Delaunay simplices of the target point similar to crystal growth. We...
Given the increasing volume and quality of genomics data, extracting new insights requires efficient and interpretable machine-learning models. This work presents Genomic Interpreter: a novel architecture for genomic assay prediction. This model out-performs the state-of-the-art models for genomic assay prediction tasks. Our model can identify hierarchical dependencies in genomic sites. This...
Neural networks achieve state-of-the art performance in image classification, medical analysis, particle physics and many more application areas. With the ever-increasing need for faster computation and lower power consumption, driven by real-time systems and Internet-of-Things (IoT), field-programmable gate arrays (FPGAs) have emerged as suitable accelerators for deep learning applications....
Today’s deep learning models consume considerable computation and memory resources, leading to significant energy consumption. To address the computation and memory challenges, quantization is often used for storing and computing data as few as possible. However, exploiting efficient quantization for computing a given ML model is challenging, because it affects both the computation accuracy...
For many deep learning applications, model size and inference speed at deployment time become a major challenge. To tackle these issues, a promising strategy is quantization.
A straightforward uniform quantization to very low precision often results in considerable accuracy loss. A solution to this predicament is the usage of mixed-precision quantization, founded on the idea that certain...
Event detection in time series data plays a crucial role in various domains, including finance, healthcare, environmental monitoring, cybersecurity, and science. Accurately identifying and understanding events in time series data is vital for making informed decisions, detecting anomalies, and predicting future trends. Extensive research has explored diverse methods for event detection in time...
Scientific experiments rely on machine learning at the edge to process extreme volumes of real-time streaming data. Extreme edge computation often requires robustness to faults, e.g., to function correctly in high radiation environments or to reduce the effects of transient errors. As such, the computation must be designed with fault tolerance as a primary objective. FKeras is a tool that...
There has been a growing trend of Multi-Modal AI models capable of gathering data from multiple sensor modalities (cameras, lidars, radars, etc.) and processing it to give more comprehensive output and predictions. Neural Network models, such as Transformers, Convolutional neural networks (CNNs), etc., exhibit the property to process data from multiple modalities and have enhanced various...
Field-programmable gate arrays (FPGAs) are widely used to implement deep learning inference. Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded the combination of linear maps and nonlinear activations inside FPGA lookup tables (LUTs). Our work is motivated...
Machine learning has been applied to many areas of clinical medicine, from assisting radiologists with scan interpretation to clinical early warning scoring systems. However, the possibilities of ML-assisted real time data interpretationand the hardware needed to realise it are yet to be fully explored. In this talk, possible applications of fast ML hardware to real-time medical imaging will...
Converged compute infrastructure refers to a trend where HPC clusters are set up for both AI and traditional HPC workloads, allowing these workloads to run on the same infrastructure, potentially reducing underutilization. Here, we explore opportunities for converged compute with GroqChip, an AI accelerator optimized for running large-scale inference workloads with high throughput and...
Machine Learning has gone through major revolutionary phases over the past decade and neural networks have become state-of-the-art approaches in many applications, from computer vision to natural language processing. However, these advances come at ever-growing computational costs, in contrast, CMOS scaling is hitting fundamental limitations such as power consumption and quantum mechanical...
Quantum readout and control is a fundamental aspect of quantum computing that requires accurate measurement of qubit states. Errors emerge in all stages, from initialization to readout, and identifying errors in post-processing necessitates resource-intensive statistical analysis. In our work, we use a lightweight fully-connected neural network (NN) to classify states of a superconducting...
Convolutional Neural Networks (CNNs) have been applied to a wide range of applications in high energy physics including jet tagging and calorimetry. Due to their computational intensity, a large amount of work has been done to accelerate CNNs in hardware, with FPGA devices serving as a high-performance and energy-efficient platform of choice. As opposed to a dense computation where every...
The continued need for improvements in accuracy, throughput, and efficiency of Deep Neural Networks has resulted in a multitude of methods that make the most of custom architectures on FPGAs. These include the creation of hand-crafted networks and the use of quantization and pruning to reduce extraneous network parameters. However, with the potential of static solutions already well exploited,...
The contribution addresses the topic of time-series recognition, specifically comparing the conventional approach of manual feature extraction with contemporary classification methods that leverage features acquired through the training process. Employing automated feature extraction software, we attained a high-dimensional representation of a time-series, obviating the necessity of...
Universal approximation theorems are the foundations of classical neural networks,
providing theoretical guarantees that the latter are able to approximate maps of interest.
Recent results have shown that this can also be achieved in a quantum setting,
whereby classical functions can be approximated by parameterised quantum circuits.
We provide here precise error bounds for specific...
Deep learning techniques have demonstrated remarkable performance in super resolution (SR) tasks for enhancing image resolution and granularity. These architectures extract image features with a convolutional block and add the extracted features to the upsampled input image transported through a skip connection, which is then converted from a depth to higher resolution space. However, SR can...
Pruning enhances neural network hardware efficiency by zeroing out weight magnitude. In order to take full advantage of pruning, efficient implementations of sparse matrix multiplication are required. The current hls4ml implementations of sparse matrix multiplication rely on either the built in high-level synthesis zero suppression operations or a coordinate list representation, which faces...
I will consider randomly initialized controlled ResNets and show that in the infinite-width-depth limit and under appropriate rescaling of weights and biases, these architectures converge weakly to Gaussian processes indexed on path-space and with kernels realised as solutions of certain data-dependent PDEs, varying according to the choice of activation function. In the special case where the...