Custom FPGA dataflow accelerators for DNN inference can enable unprecedented performance and efficiency for many applications. Dataflow accelerator compilers, such as the FINN framework, have improved in recent years and allow practitioners to explore this technology without requiring in-depth FPGA knowledge.
However, the overall design process remains quite tedious, time-consuming, and...
As the demand for efficient machine learning on resource-limited devices grows, model compression techniques like pruning and quantization have become increasingly vital. Despite their importance, these methods are typically developed in isolation, and while some libraries attempt to offer unified interfaces for compression, they often lack support for deployment tools such as hls4ml. To...
As neural networks (NNs) are increasingly used to provide
edge intelligence, there is a growing need to make the edge devices
that run them robust to faults. Edge devices must mitigate the resulting
hardware failures while maintaining strict constraints on power, energy,
latency, throughput, memory size, and computational resources. Edge
NNs require fundamental changes in model...
On-chip learning has the potential to unlock low-latency, low-power, and continuously adaptive AI directly on edge devices. However, research in this area remains limited by the lack of accessible hardware toolchains that support backpropagation. To address this gap, we propose ENABOL, a hardware-efficient extension of the HLS4ML toolchain that enables customizable backpropagation support...
Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs pipelined with II=1. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the required constant matrix-vector multiplication (CMVM) operations. In this work, we propose an efficient...
The ATLAS Level-0 Global Trigger is a mission critical system opting to take advantage of the full calorimeter granularity during Run-4 and beyond. Level-0 Global will be executing a cascade of trigger algorithms combined both the calorimeter information and the muons. Within the Next Generation Trigger project at CERN there is a dedicated work package (WP2.1) exploring large deployment of...
In the era of continuous data generation, real-time processing of data streams has become crucial for timely, adaptive, and context-aware decision-making. However, maintaining effective learning models in such dynamic environments requires carefully balancing prediction performance, transparency and energy consumption.
In the talk, we will present two new state-of-the-art methods for...
The widespread deployment of embedded ML systems has created a need for resilient, fault-tolerant hardware and software capable of operating in inherently noisy conditions. While the standardization of low-precision (โค 8-bit) datatypes has allowed for reduced training and inference costs and increased interoperability across commercial accelerators, clear guidelines for robust implementation...
The rising computational demands of increasing data rates and complex machine learning (ML) algorithms in large-scale scientific experiments have driven the adoption of the Services for Optimized Network Inference on Coprocessors (SONIC) framework. SONIC accelerates ML inference by offloading tasks to local or remote coprocessors, optimizing resource utilization. Its portability across diverse...
Most of the current machine learning (ML) applications are purely data-driven solutions with little considerations for underlying problem dynamics, limited to in-distribution applications. To tackle this limitation a stream of literature is emerging to address out-of-distribution (OOD) performance: Algorithmic alignment, which focuses on embedding algorithmic structures into ML architectures...
Matrix-vector (GEMV) operations are a common building block in many deep learning models, particularly for large dense layers found in convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs). Despite their importance, GEMV kernels have historically underperformed compared to matrix-matrix (GEMM) operations due to their lower arithmetic intensity and limited data reuse, making...
AXOL1TL is an anomaly detection (AD) trigger algorithm integrated into the Global Trigger (GT) of the CMS Level-1 Trigger (L1T) system since 2024. The GT reduces the event rate from protonโproton collisions at the LHC, lowering it from 40 MHz to 100 kHz within a fixed latency of 50 ns. The AD algorithm, implemented in the FPGA firmware of the GT board, uses an autoencoder to assign an anomaly...
The absence of BSM physics discoveries at the LHC suggests new physics could lie outside current trigger schemes. By applying unsupervised MLโbased anomaly detection, we gain a model-agnostic way of spotting anomalous signatures that deviate from the current triggerโs expectations. Here we introduce a Run-3 trigger chain that embeds fast anomaly detection algorithms in both hardware and...
At the Phase-2 Upgrade of the CMS Level-1 Trigger (L1T), particles will be reconstructed by linking charged particle tracks with clusters in the calorimeters and muon tracks from the muon station. The 200 pileup interactions will be mitigated using primary vertex reconstruction for charged particles and a weighting for neutral particles based on the distribution of energy in a small area. Jets...
Belle II is a luminosity frontier experiment located at the SuperKEKB asymmetric $e^+ e^-$ collider, operating at the $\Upsilon(4S)$ resonance. The $\tau$ physics program at Belle II involves both probes of new physics and precision measurements of standard model parameters with large statistics. SuperKEKB is projected to reach a luminosity of $6\times 10^{35}~\text{cm}^{-2}\text{s}^{-1}$ in...
The High Luminosity upgrade of the Large Hadron Collider (HL-LHC) presents a demanding environment for real-time data processing, with substantially increased event rates requiring faster and more efficient trigger systems. This study explores the deployment of graph neural networks (GNNs) on field-programmable gate arrays (FPGAs) for fast and accurate inference within future muon trigger...
The ATLAS trigger system will undergo a comprehensive upgrade in advance of the HL-LHC programme. In order to deal with the increased data bandwidth trigger algorithms will be required to satisfy stricter latency requirements. We propose a method to speed up the current calorimeter-only preselection step and to aid trigger decisions for hadronic signals containing jets.
We demonstrate the use...
Optimized FPGA implementations of tiny neural networks are crucial for low-latency and hardware-efficient inference for a variety of applications. Neural networks based on lookup tables (LUTs) are a standard technique for such problems due to their hardware efficiency and strong expressivity. However, such networks are often difficult to scale up as their resource usage scales exponentially...
Modern foundation models (FMs) have pushed the frontiers of language, vision, and multi-model tasks by training ever-larger neural networks (NN) on unprecedented volumes of data. The use of FM models has yet to be established in collider physics, which both lack a comparably sized, general-purpose dataset on which to pre-train universal event representations, and a clear demonstrable need....
The analysis of point cloud data, for example signals from charged particles recorded by detectors in high energy physics (HEP) experiments, can be significantly enhanced and accelerated by the application of machine learning models. In recent years, transformer architectures have come into focus as offering excellent model performance. However, for traditional transformers,the need to compute...
The Interaction Network (IN) algorithm has shown great promise for particle tracking applications at the Large Hadron Collider (LHC), where identifying complex particle trajectories from raw detector data is a computationally intensive task. IN leverages graph-based representations of detector hits to learn relationships between particle interactions, making it well-suited for this domain....
The trigger systems of ATLAS and CMS currently reject vast numbers of potentially valuable collision events due to their conservative, static designs, a limitation that directly hampers discovery potential. We propose an alternative to these rigid, hand-tuned menus with an autonomous controller capable of dynamically optimizing trigger performance in real time.
In this work, we demonstrate...
Machine Learning (ML) techniques are increasingly applied for the optimization of complex computing systems, but their integration into core low-level system mechanisms remains limited. A key barrier is the lack of accessible, high- performance interfaces at the boundary between software and hardware as well as hardware-offloaded ML-inference at full systems speed. In this presentation, we...
Tuning hyperparameters of ML models, especially large ML models, can be time consuming and computationally expensive. As a potential solution, several recent papers have explored hyperparameter transfer. Under certain conditions, the optimal hyperparameters of a small model are also optimal for larger models. One can therefore tune only the small model and transfer the hyperparameters to the...
Graph Neural Networks (GNNs), particularly Interaction Networks (INs), have shown exceptional performance for jet tagging at the CERN High-Luminosity Large Hadron Collider. However, their computational complexity and irregular memory access patterns pose significant challenges for deployment on FPGAs in hardware trigger systems, where strict latency and resource constraints apply.
In this...
The Smartpixels project is a coordinated effort to co-design pixel ASICs, design tools, ML algorithms, and sensors for on-detector data reduction, motivated by the technical challenges of current and future colliders. The drive to greater precision requires smaller pixel pitch, which together with higher event rates arising from pileup and/or beam-induced background generates petabytes of data...
We conduct a systematic study of quantum-inspired Tensor Network (TN) modelsโMatrix Product States (MPS) and Tree Tensor Networks (TTN)โfor real-time jet tagging in high-energy physics, with a focus on low-latency deployment on FPGAs. Motivated by the strict computational demands of the HL-LHC Level-1 Trigger system, we explore TN architectures as compact and interpretable alternatives to deep...
Hadronic calorimeters are a key part of high energy physics experiments. Traditionally, they rely on high granularity to improve performances, but this leads to various challenges in terms of cost, energy consumption and output data volume. Moreover, current detectors do not have the capability of exploiting temporal information of the shower development, as the time frame for pattern...
Inference of standard convolutional neural networks (CNNs) on FPGAs often incurs high latency and long initiation intervals due to the nested loops required to slide filters across the full input, especially when the input dimensions are large. However, in some datasets, meaningful signals may occupy only a small fraction of the input, say sometimes just a few percent of the total pixels or...
Reflection High-Energy Electron Diffraction (RHEED) is a common diffraction-based surface characterization technique for analyzing the properties of crystalline materials that are grown using a thin-film deposition technique like pulsed-laser deposition (PLD) or molecular-beam epitaxy (MBE). In this work, we design an FPGA-accelerated machine learning (ML) algorithm to perform real-time...
Transformers are the state-of-the-art model architectures and widely used in application areas of machine learning. However the performance of such architectures is less well explored in the ultra-low latency domains where deployment on FPGAs or ASICs is required. Such domains include the trigger and data acquisition systems of the LHC experiments.
We present a transformer-based algorithm...
The LHCb Upgrade II will operate at a data rate of 200โฏTb/s, requiring efficient real-time data reduction. A major challenge of this pipeline is the transfer of full timing information from the frontend Electromagnetic Calorimeter (ECAL) to the backend for processing, which is critical for resolving pile-up, background suppression, and enhancing energy resolution. Due to the data rate, full...
With increasing beam background levels at Belle II, which have already been observed due to the world-record instantaneous luminosities achieved by SuperKEKB and which are expected to rise further, an upgrade of the current Level 1 (L1) trigger algorithms is necessary to handle the evolving conditions. In this work, we present an upgraded L1 electromagnetic calorimeter trigger, based on Graph...
The PVFinder algorithm employs a hybrid deep neural network (DNN) approach to reconstruct primary vertices (PVs) in proton-proton collisions at the LHC, addressing the complexities of high pile-up environments in LHCb and ATLAS experiments. By integrating fully connected layers with a UNet architecture, PVFinderโs end-to-end tracks-to-hist DNN processes charged track parameters to predict PV...
The Project 8 experiment aims to directly probe the neutrino mass by precisely measuring the energy spectrum of beta electrons emitted in the decay of tritium. The collaboration has pioneered the cyclotron radiation emission spectroscopy technique (CRES), which measures the energy of single electrons by detecting the cyclotron radiation they emit in a magnetic field. Traditional methods for...
High-resolution electron microscopy generates large volumes of pixel detector data due to beam rates reaching $10^7$ to $10^{10}$ electrons per second directed at the sample. Of this data, only the electron entry point into the silicon detector prior to scattering is typically of interest for downstream analysis. Precise knowledge of these entry points is particularly important in electron...