This study introduces a novel transformer model optimized for large-scale point cloud processing in scientific domains such as high-energy physics (HEP) and astrophysics. Addressing the limitations of graph neural networks and standard transformers, our model integrates local inductive bias and achieves near-linear complexity with hardware-friendly regular operations. One contribution of this...
One of the most significant challenges in tracking reconstruction is the reduction of "ghost tracks," which are composed of false hit combinations in the detectors. When tracking reconstruction is performed in real-time at 30 MHz, it introduces the difficulty of meeting high efficiency and throughput requirements. A single-layer feed-forward neural network (NN) has been developed and trained...
Deep Learning (DL) applications for gravitational wave (GW) physics are becoming increasingly common without the infrastructure to validate them at scale or deploy them in real-time. The challenge of gravitational waves requires and real-time time series workflow. With ever more sensitive GW observing runs beginning in 2023-5 and progressing through the next decade, ever-increasing...
Computing demands for large scientific experiments, including experiments at the Large Hadron Collider and the future DUNE neutrino detector, will increase dramatically in the next decades. Heterogeneous computing provides a solution enabling increased computing demands that pass the limitations brought on by the end of Dennard scaling. However, to effectively exploit Heterogeneous compute,...
The Deep(er)RICH architecture integrates Swin Transformers and normalizing flows, and demonstrates significant advancements in particle identification (PID) and fast simulation. Building on the earlier DeepRICH model, Deep(er)RICH extends its capabilities across the entire kinematic region covered by the DIRC detector in the \textsc{GlueX} experiment. It learns particle identification (PID)...
Recently, compelling evidence for the emission of high-energy neutrinos from our host Galaxy - the Milky Way - was reported by IceCube, a neutrino detector instrumenting a cubic kilometer of glacial ice at the South Pole. This breakthrough observation is enabled by advances in AI, including a physics-driven deep learning method capable of exploiting available symmetries and domain knowledge....
This R&D project, initiated by the DOE Nuclear Physics AI-Machine Learning initiative in 2022, explores advanced AI technologies to address data processing challenges at RHIC and future EIC experiments. The main objective is to develop a demonstrator capable of efficient online identification of heavy-flavor events in proton-proton collisions (~1 MHz) based on their decay topologies, while...
Attention-based transformers are ubiquitous in machine learning applications from natural language processing to computer vision. In high energy physics, one central application is to classify collimated particle showers in colliders based on the particle of origin, known as jet tagging. In this work, we study the interpretatbility and prospects for acceleration of Particle Transformer (ParT),...
In this work, we present the Scalable QUantization-Aware Real-time Keras (S-QUARK), an advanced quantization-aware training (QAT) framework for efficient FPGAs inference built on top of Keras-v3, supporting all Tensorflow, JAX, and PyTorch backends.
The framework inherits all perks from the High Granularity Quantization (HGQ) library, and extends it to support fixed-point numbers with...
The next phase of high energy particle physics research at CERN will
involve the High-Luminosity Large Hadron Collider (HL-LHC). In preparation for
this phase, the ATLAS Trigger and Data AcQuisition (TDAQ) system will undergo
upgrades to the online software tracking capabilities. Studies are underway to
assess a heterogeneous computing farm deploying GPUs and/or FPGAs, together
with the...
An Artificial Intelligence (AI) model will spend “90% of its lifetime in inference.”To fully utilize co-
processors, such as FPGAs or GPUs, for AI inference requires O(10) CPU cores to feed to work to the
coprocessors. Traditional data analysis pipelines will not be able to effectively and efficiently use
the coprocessors to their full potential. To allow for distributed access to...
Processing large volumes of sparse neutrino interaction data is essential to the success of liquid argon time projection chamber (LArTPC) experiments such as DUNE. High rates of radiological background must be eliminated to extract critical information for track reconstruction and downstream analysis. Given the computational load of this rejection, and potential real time constraints of...
Detector simulation is a key component of physics analysis and related activities in particle physics.In the upcoming High Luminosity LHC era, simulation will be required to use a smaller fraction of computing in order to satisfy resource constraints at the same time as experiments are being upgraded new with the new higher granularity detectors, which requires significantly more resources to...
The demand for machine learning algorithms on edge devices, such as Field-Programmable Gate Arrays (FPGAs), arises from the need to process and intelligently reduce vast amounts of data in real-time, especially in large-scale experiments like the Deep Underground Neutrino Experiment (DUNE). Traditional methods, such as thresholding, clustering, multiplicity checks, or coincidence checks,...
Detecting quenches in superconducting (SC) magnets by non-invasive means is a challenging real-time process that involves capturing
and sorting through physical events that occur at different frequencies and appear as various signal features. These events may be correlated across instrumentation type, thermal cycle, and ramp. These events together build a more complete picture of continuous...
Reinforcement Learning (RL) is a promising approach for the autonomous AI-based control of particle accelerators. Real-time requirements for these algorithms can often not be satisfied with conventional hardware platforms.
In this contribution, the unique KINGFISHER platform being developed at KIT will be presented. Based on the novel AMD-Xilinx Versal platform, this system provides...
AI Red Teaming, an offshoot of traditional cybersecurity practices, has emerged as a critical tool for ensuring the integrity of AI systems. An under explored area has been the application of AI Red Teaming methodologies to scientific applications, which increasingly use machine learning models in workflows. I'll highlight why this is important and how AI Red Teaming can highlight...
Neural networks with a latency requirement at the order of microseconds, like the ones used at the CERN Large Hadron Colliders, are typically deployed on FPGAs fully unrolled. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the number of Multiply Accumulate (MAC) operations in matrix-vector multiplications.
In this work, we present...
Characterizing the loss of a neural network can provide insights into local structure (e.g., smoothness of the so-called loss landscape) and global properties of the underlying model (e.g., generalization performance). Inspired by powerful tools from topological data analysis (TDA) for summarizing high-dimensional data, we are developing tools for characterizing the underlying shape (or...
Matched-filtering detection techniques for gravitational-wave (GW) signals in ground-based interferometers rely on having well-modeled templates of the GW emission. Such techniques have been traditionally used in searches for compact binary coalescences (CBCs) and have been employed in all known GW detections so far. However, interesting science cases aside from compact mergers do not yet have...
We present the development, deployment, and initial recorded data of an unsupervised autoencoder trained for unbiased detection of new physics signatures in the CMS experiment during LHC Run 3. The Global Trigger makes the final hardware decision to readout or discard data from each LHC collision, which occur at a rate of 40 MHz, within nanosecond latency constraints. The anomaly detection...
The rapidly developing frontiers of additive manufacturing, especially multi-photon lithography, create a constant need for optimization of new process parameters. Multi-photon lithography is a 3D printing technique which uses the nonlinear absorption of two or more photons from a high intensity light source to induce highly confined polymerization. The process can 3D print structures with...
Coherent diffractive imaging (CDI) techniques like ptychography enable nanoscale imaging, bypassing the resolution limits of lenses. Yet, the need for time consuming iterative phase recovery hampers real-time imaging. While supervised deep learning strategies have increased reconstruction speed, they sacrifice image quality. Furthermore, these methods’ demand for extensive labeled training...
In the search for new physics, real-time detection of anomalous events is critical for maximizing the discovery potential of the LHC. CICADA (Calorimeter Image Convolutional Anomaly Detection Algorithm) is a novel CMS trigger algorithm operating at the 40 MHz collision rate. By leveraging unsupervised deep learning techniques, CICADA aims to enable physics-model independent trigger decisions,...
Unsupervised learning algorithms enable insights from large, unlabeled datasets, allowing for feature extraction and anomaly detection that can reveal latent patterns and relationships often not found by supervised or classical algorithms. Modern particle detectors, including liquid argon time projection chambers (LArTPCs), collect a vast amount of data, making it impractical to save...
Low latency machine learning inference is vital for many high-speed imaging applications across various scientific domains. From analyzing fusion plasma [1] to rapid cell-sorting [2], there is a need for in-situ fast inference in experiments operating in the kHz to MHz range. External PCIe accelerators are often unsuitable for these experiments due to the associated data transfer overhead,...
Recent advancements in generative artificial intelligence (AI), including transformers, adversarial networks, and diffusion models, have demonstrated significant potential across various fields, from creative art to drug discovery. Leveraging these models in engineering applications, particularly in nanophotonics, is an emerging frontier. Nanophotonic metasurfaces, which manipulate light at...
Applications like high-energy physics and cybersecurity require extremely high throughput and low latency neural network (NN) inference. Lookup-table-based NNs address these constraints by implementing NNs purely as lookup tables (LUTs), achieving inference latency on the order of nanoseconds. Since LUTs are a fundamental FPGA building block, LUT-based NNs map to FPGAs easily. LogicNets (and...
Recent advancements in Vision-Language Models (VLMs) have enabled complex multimodal tasks by processing text and image data simultaneously, significantly enhancing the field of artificial intelligence. However, these models often exhibit biases that can skew outputs towards societal stereotypes, thus necessitating debiasing strategies. Existing debiasing methods focus narrowly on specific...
As machine learning (ML) increasingly serves as a tool for addressing real-time challenges in scientific applications, the development of advanced tooling has significantly reduced the time required to iterate on various designs. Despite these advancements in areas that once posed major obstacles, newer challenges have emerged. For example, processes that were not previously considered...
We develop an automated pipeline to streamline neural architecture codesign for physics applications, to reduce the need for ML expertise when designing models for a novel task. Our method employs a two-stage neural architecture search (NAS) design to enhance these models, including hardware costs, leading to the discovery of more hardware-efficient neural architectures. The global search...
Deep learning, particularly employing the Unet architecture, has become pivotal in cardiology, facilitating detailed analysis of heart anatomy and function. The segmentation of cardiac images enables the quantification of essential parameters such as myocardial viability, ejection fraction, cardiac chamber volumes, and morphological features. These segmentation methods operate autonomously...
The number of CubeSats launched for data-intensive applications is increasing due to the modularity and reduced cost these platforms provide. Consequently, there is a growing need for efficient data processing and compression. Tailoring onboard processing with Machine Learning to specific mission tasks can optimise downlink usage by focusing only on relevant data, ultimately reducing the...
Modern scientific instruments generate vast amounts of data at increasingly higher rates, outpacing traditional data management strategies that rely on large-scale transfers to offline storage for post-analysis. To enable next-generation experiments, data processing must be performed at the edge—directly alongside the scientific instruments. By integrating these instruments with...
In situ machine learning data processing for neuroscience probes can have wide-reaching applications from data filtering, event triggering, and ultimately real-time interventions at kilohertz frequencies intrinsic to natural systems. In this work, we present the integration of Machine Learning (ML) algorithms on an off-the-shelf neuroscience data acquisition platform by Spike Gadgets. The...
Artificial neural networks (ANNs) are capable of complex feature extraction and classification with applications in robotics, natural language processing, and data science. Yet, many ANNs have several key limitations; notably, current neural network architectures require enormous training datasets and are computationally inefficient. It has been posited that biophysical computations in single...
We introduce a smart pixel prototype readout integrated circuit (ROIC) fabricated using a 28 nm bulk CMOS process, which integrates a machine learning (ML) algorithm for data filtering directly within the pixel region. This prototype serves as a proof-of-concept for a potential Phase III pixel detector upgrade of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC)....
Nowadays, the application of neural networks (NNs) has expanded across different industries (e.g., autonomous vehicles, manufacturing, natural-language processing, etc.) due to their improved accuracy results. This was made possible because of the increased complexity of these networks which requires higher computational efforts and memory consumption. As a result, there is more demand for...
High-fidelity single-shot quantum state readout is crucial for advancing quantum technology. Machine-learning (ML) assisted qubit-state discriminators have shown high readout fidelity and strong resistance to crosstalk. By directly integrating these ML models into FPGA-based control hardware, fast feedback control becomes feasible, which is vital for quantum error correction and other...
The Electron Ion Collider (EIC) promises unprecedented insights into nuclear matter and quark-gluon interactions, with advances in artificial intelligence (AI) and machine learning (ML) playing a crucial role in unlocking its full potential. This talk will explore potential opportunities for AI/ML integration within the EIC program, drawn from broader discussions in the AI4EIC forum. I will...
Deploying Machine Learning (ML) models on Field-Programmable Gate Arrays (FPGAs) is becoming increasingly popular across various domains as a low-latency and low-power solution that helps manage large data rates generated by continuously improving detectors. However, developing ML models for FPGA deployment is often hindered by the time-consuming synthesis procedure required to evaluate...
Ultra-high-speed detectors are crucial in scientific and healthcare fields, such as medical imaging, particle accelerators and astrophysics. Consequently, upcoming large dark matter experiments, like the ARGO detector with an anticipated 200 m² detector surface, are generating massive amounts of data across a large quantity of channels that increase hardware, energy and environmental costs....
High-Level Synthesis (HLS) techniques, coupled with domain-specific translation tools such as HLS4ML, have made the development of FPGA-based Machine Learning (ML) accelerators more accessible than ever before, allowing scientists to develop and test new models on hardware with unprecedented speed. However, these advantages come with significant costs in terms of implementation complexity. The...
We introduce the Differentiable Weightless Neural Network (DWN), a model based on interconnected lookup tables. Training of DWNs is enabled by a novel Extended Finite Difference technique for approximate differentiation of binary values. We propose Learnable Mapping, Learnable Reduction, and Spectral Regularization to further improve the accuracy and efficiency of these models. We evaluate...
The increasing demand for efficient machine learning (ML) acceleration has intensified the need for user-friendly yet flexible solutions, particularly for edge computing. Field Programmable Gate Arrays (FPGAs), with their high configurability and low-latency processing, offer a compelling platform for this challenge. Our presentation gives update to an end-to-end ML acceleration flow utilizing...
Transformers are becoming increasingly popular in fields such as natural language processing, speech processing, and computer vision. However, due to the high memory bandwidth and power requirements of Transformers, contemporary hardware is gradually unable to keep pace with the trend of larger models. To improve hardware efficiency and increase throughput and reduce latency, there has been a...
Neutrinoless double beta ($0 \nu \beta \beta$) decay is a Beyond the Standard Model process that, if discovered, could prove the Majorana nature of neutrinos—that they are their own antiparticles. In their search for this process, $0 \nu \beta \beta$ decay experiments rely on signal/background discrimination, which is traditionally approached as a supervised learning problem. However, the...
High-purity germanium spectrometers are widely used in fundamental physics and beyond. Their excellent energy resolution enables the detection of electromagnetic signals and recoils down to below 1keV ionization energy and even lower. However, the detectors are also very sensitive to all types of noise that will overwhelm the trigger routines of the data acquisition system and significantly...
The coalescence of binary neutron star (BNS) in the event GW170817, leading to the generation of gravitational waves (GW) and accompanied by kilonova (KNe), the electromagnetic (EM) counterpart, has been a prime topic of interest for the Astronomy community in recent times as it provided much insight into multi-messenger astronomy. Since its discovery in 2017, several research teams have put...
As scientific experiments are generating increasingly larger and more complex datasets, the need to accelerate scientific workflows becomes ever more pressing. Recent advancements in machine learning (ML) algorithms, combined with the power of cutting-edge GPUs, have led to significant performance gains. However, optimizing computational efficiency remains crucial to minimize processing...
Reflection High Energy Electron Diffraction (RHEED) is a technique for real-time monitoring of surface crystal structures during thin-film deposition. By directing a high-energy electron beam at a shallow angle onto a crystalline surface, RHEED produces diffraction patterns that reveal valuable information about both the bulk structure and the surface's atomic arrangement. The resulting...
Modern AI model creation requires ample computational power to process data in both predictive and learning phases. Due to memory and processing constraints, edge and IoT electronics using such models can be forced to outsource optimization and training to either the cloud or pre-deployment development. This poses issues when optimization and classification are required from sensor and...
Detectors at next-generation high-energy physics experiments face several daunting requirements: high data rates, damaging radiation exposure, and stringent constraints on power, space, and latency. In light of this, recent detector design studies have explored the use of machine learning (ML) in readout Application-Specific Integrated Circuits (ASICs) to run intelligent inference and data...
Anomaly detection (AD) in the earliest stage of LHC trigger systems represents a fundamentally new tool to enable data-driven discoveries. While initial efforts have focused on adapting powerful offline algorithms to these high-throughput streaming systems, the question of how such algorithms should adapt to constantly-evolving detector conditions remains a major challenge. In this work, we...
As deep learning methods and particularly Large Language Models have shown huge promise in a variety of applications, we attempt to apply a BERT (Bidirectional Encoder Representations from Transformers) model developed by Google utilizing the infamous multiheaded attention mechanism to a high energy physics problem. Specifically, we focus on the process of top quark-anti top decay...
Diffusion is a natural phenomenon in fluids. Its measurement can be done optically by seeding an otherwise featureless fluid with tracer particles and observing their motion using a microscope. However, existing particle-based diffusion coefficient measurement algorithms have multiple failure modes, especially when the fluid has a flow, or the particles are defocused. This work uses...
In material science, 4D Scanning Transmission Electron Microscopy (4D STEM) is a dataset of images formed by electrons passing through a thin specimen with the electron beam focused on a fine spot [1], allowing material scientists to learn some structural properties. Oxley et al. showed that deep learning is powerful for distinguishing structures embedded within the data [2]. However, Oxley et...
Many studies in recent years have shown that neural networks (NNs) trained using jet sub-structure observables in ultra-relativistic heavy ion collision events are capable of significantly increasing the resolution of jet-\pT background corrections relative to the standard area-based technique. However, modifications to jet substructure due to quenching in quark-gluon plasma (QGP) in central...
We demonstrate the use of the MLP-Mixer architecture for fast jet classification in high-energy physics. The MLP-Mixer architecture is a simple and efficient architecture consisting of MLP blocks applied in different directions of the input tensor. It is first proposed by Tolstikhin et al., and is shown to be competitive with state-of-the-art architectures...
We present our approach to mitigate the Beam-Induced Background(BIB) in a muon collider, leveraging machine learning. We then utilize pruning and quantization-aware training to enable real-time data processing, and demonstrate that we can distinguish BIB energy deposits from physics processes of interest with significant accuracy using FPGAs. Our work is a first proof-of-concept of the ability...
Density Functional Theory (DFT) is one of the most successful methods for computing ground-state properties of molecules and materials. In its purest form ("orbital-free DFT"), it transforms a $3N$-dimensional interacting electron problem into one 3D integro-differential problem at the cost of approximating two functionals of the electron density $n(\mathbf{r})$, one of them being for the...
Particle tracking at Large Hadron Collider (LHC) experiments is a crucial component of particle reconstruction, yet it remains one of the most computationally challenging tasks in this process. As we approach the High-Luminosity LHC era, the complexity of tracking is expected to increase significantly. Leveraging coprocessors such as GPUs presents a promising solution to the rising...
One potential way to meet the quickly growing computing demands in High Energy Physics (HEP) experiments is by leveraging specialized processors such as GPUs. The “as a service” (AAS) approach helps improve utilization of GPU resources by allowing one GPU to serve a wide range of tasks, significantly reducing idle time. The SONIC project implements the AAS approach for a variety of widely used...
In the presentation, the introduction of the Intel FPGA AI Suite alongside the revolutionary AI Tensor Blocks recently incorporated into the latest FPGA device families by Intel for deep learning inference is showcased. These innovative FPGA components bring real-time, low-latency, and energy-efficient processing to the forefront. They are supported by the inherent advantages of Intel FPGAs,...
Recent advancements in use of machine learning (ML) techniques on field-programmable gate arrays (FPGAs) have allowed for the implementation of embedded neural networks with extremely low latency. This is invaluable for particle detectors at the Large Hadron Collider, where latency and used area are strictly bounded. The hls4ml framework is a procedure that converts trained ML model software...
Fast, accurate simulations are becoming increasingly necessary for the precision measurements and BSM searches planned by LHC experiments in Run 3 and beyond. The recent breakthroughs in deep generative modelling in computer vision and natural language processing offer a promising and exciting avenue for improving the speed of current LHC simulation paradigms by up to 3 orders of magnitude. We...
This work introduces advanced computational techniques for modeling the time evolution of compact binary systems using machine learning. The dynamics of compact binary systems, such as black holes and neutron stars, present significant nonlinear challenges due to the strong gravitational interactions and the requirement for precise numerical simulations. Traditional methods, like the...
Optimizing the inference of Graph Neural Networks (GNNs) for track finding is very important for improving how well particle collision event reconstruction works. In high-energy physics experiments, like at the Large Hadron Collider (LHC), detectors generate a ton of complicated and noisy data from particles colliding at extremely high speeds. Track finding is about reconstructing the paths of...
Upgrades to the CMS experiment will see the average pileup go from 50 to 140 and eventually 200. With current algorithms, this would mean that almost 50% of the High Level Trigger time budget would be spent on particle track reconstruction. Many ML methods have been explored to address the challenge of slow particle tracking at high pileup. Reinforcement learning is presented as a novel method...
Deploying large CNNs on resource-constrained hardware such as FPGAs poses significant challenges, particularly in balancing high throughput with limited resources and power consumption. To address these challenges, hls4ml was leveraged to accelerate inference through a streaming architecture, in contrast to programmable engines with dedicated instruction sets commonly used to scale to...
Pixel detectors are highly valuable for their precise measurement of charged particle trajectories. However, next-generation detectors will demand even smaller pixel sizes, resulting in extremely high data rates surpassing those at the HL-LHC. This necessitates a “smart” approach for processing incoming data, significantly reducing the data volume for a detector’s trigger system to select...
Tracking algorithms play a vital role in both online and offline event reconstruction in Large Hadron Collider (LHC) experiments; however, they are the most time-consuming component in the particle reconstruction chain. To reduce processing time, existing tracking algorithms have been adapted for use on massively parallel coprocessors such as GPUs. Nevertheless, fully utilizing the...