1–5 Sept 2025
ETH Zurich
Europe/Zurich timezone

Contribution List

126 out of 126 displayed
Export to PDF
  1. 01/09/2025, 08:30
  2. Benjamin Ramhorst (ETH Zurich)
    01/09/2025, 09:00
    Tutorials
    Tutorial

    FPGAs provide unique advantages in the realm of machine learning acceleration. Unlike CPUs and GPUs, FPGAs allow for custom parallelism, data type precision and dataflow tailored specifically to the workload. Their reconfigurability enables the design of optimised hardware circuits that can reduce latency, power consumption, and improve throughput. Some common examples of FPGA-accelerated...

    Go to contribution page
  3. Marta Andronic (Imperial College London), Mr Oliver Cassidy (Imperial College London)
    01/09/2025, 09:00
    Tutorial

    Neural networks (NNs) have gained significant interest in recent years due to their prevalence in AI applications. Lookup table (LUT) based NN architectures have emerged as a promising solution for ultra-low latency inference on reconfigurable hardware such as field programmable gate arrays (FPGAs). These techniques promise significant enhancements in both resource efficiency and inference...

    Go to contribution page
  4. Mr Giovanni Gozzi (Politecnico di Milano), Mr Michele Fiorito (Politecnico di Milano), Dr Vito Giovanni Castellana (Pacific Northwest National Laboratory), Dr Antonino Tumeo (Pacific Northwest National Laboratory), Fabrizio Ferrandi (Politecnico di Milano)
    01/09/2025, 09:00
    Tutorial

    This tutorial explores the growing demand for domain-specific hardware accelerators driven by the rapid evolution of AI and data analytics. Traditional hardware design cycles are too slow to keep up with the pace of algorithmic innovation. To address this, new agile hardware design methodologies are emerging, leveraging compiler technologies and High-Level Synthesis (HLS) to automate and...

    Go to contribution page
  5. Benjamin Ramhorst (ETH Zurich)
    01/09/2025, 11:00
    Tutorial

    As Moore’s Law and Dennard Scaling reach their limits, computing is shifting toward heterogeneous hardware for large-scale data processing. Cloud vendors are deploying accelerators, like GPUs, DPUs, and FPGAs, to meet growing computational demands of ML and big data.

    While FPGAs offer great flexibility and performance, practically integrating them in larger systems remains challenging due...

    Go to contribution page
  6. Chang Sun (California Institute of Technology (US))
    01/09/2025, 11:00
    Tutorials
    Tutorial

    Neural networks with a latency requirement on the order of microseconds are widely used at the CERN Large Hadron Collider, particularly in the low-level trigger system. To satisfy this latency requirement, these neural networks are often deployed on FPGAs.

    This tutorial aims to provide a practical, hands-on guide of a software-hardware co-design workflow using the HGQ2 and da4ml libraries....

    Go to contribution page
  7. Mr Giovanni Gozzi (Politecnico di Milano), Mr Michele Fiorito (Politecnico di Milano), Dr Vito Giovanni Castellana (Pacific Northwest National Laboratory), Dr Antonino Tumeo (Pacific Northwest National Laboratory), Fabrizio Ferrandi (Politecnico di Milano), Nicolo Ghielmetti (CERN)
    01/09/2025, 11:00
    Tutorial

    This tutorial explores the growing demand for domain-specific hardware accelerators driven by the rapid evolution of AI and data analytics. Traditional hardware design cycles are too slow to keep up with the pace of algorithmic innovation. To address this, new agile hardware design methodologies are emerging, leveraging compiler technologies and High-Level Synthesis (HLS) to automate and...

    Go to contribution page
  8. Dmitri Demler
    01/09/2025, 11:00
    Tutorial

    Machine learning has become a critical tool for analysis and decision-making across a wide range of scientific domains, from particle physics to materials science. However, the deployment of neural networks in resource-constrained environments, such as hardware accelerators and edge devices, remains a significant challenge. This often requires specialized expertise in both neural architecture...

    Go to contribution page
  9. 01/09/2025, 12:30
  10. Maciej Besta (ETH Zurich)
    01/09/2025, 14:00
    Invited Talks
  11. Giacomo Indiveri (ETH Zurich)
    01/09/2025, 14:45
    Invited Talks

    While machine learning has made tremendous progress in recent years, there is still a large gap between artificial and natural intelligence.
    Closing this gap requires combining fundamental research in neuroscience with mathematics, physics, and engineering to understand the principles of neural computation and cognition.
    Mixed-signal subthreshold analog and asynchronous digital electronic...

    Go to contribution page
  12. Vava Gligorov (Centre National de la Recherche Scientifique (FR))
    01/09/2025, 15:50
    Invited Talks

    The real-time processing of data created by the Large Hadron Collider's (LHC) experiments, amounting to over 10% of worldwide internet traffic, is one of the greatest computing challenges ever attempted. I will discuss the concrete applications of real-time processing in the LHC's main experiments, and the technological innovations in this area over the past decades. I will also reflect on the...

    Go to contribution page
  13. Maximilian Dax (ELLIS Institute Tübingen)
    01/09/2025, 16:35
    Invited Talks
  14. 01/09/2025, 17:20
  15. Patrick Kidger (Cradle.bio)
    02/09/2025, 09:00
    Invited Talks

    This talk provides an overview of several libraries in the open-source JAX ecosystem (such as Equinox, Diffrax, Optimistix, ...) In short, we have been building an "autodifferentiable GPU-capable scipy". These libraries offer the foundational core of tools that have made it possible for us to train neural networks (e.g. score-based diffusions for image generation), solve PDEs, and smoothly...

    Go to contribution page
  16. Dr Andrea Cossettini (ETH Zurich)
    02/09/2025, 09:45
    Invited Talks

    Most commercial wearables still capture only basic metrics such as step counts or heart rate, and remain closed systems without access to raw data. In this talk, I will present our holistic approach to full-body biosignal intelligence, where ultra-low-power embedded platforms and machine learning algorithms are co-designed to capture and process signals from the brain, eyes, muscles, and...

    Go to contribution page
  17. Yulia Sandamirskaya (Zurich University of Applied Sciences)
    02/09/2025, 11:00
    Invited Talks
  18. Felix Jentzsch
    02/09/2025, 13:00
    Standard Talk

    Custom FPGA dataflow accelerators for DNN inference can enable unprecedented performance and efficiency for many applications. Dataflow accelerator compilers, such as the FINN framework, have improved in recent years and allow practitioners to explore this technology without requiring in-depth FPGA knowledge.

    However, the overall design process remains quite tedious, time-consuming, and...

    Go to contribution page
  19. Roope Oskari Niemi
    02/09/2025, 13:20
    Standard Talk

    As the demand for efficient machine learning on resource-limited devices grows, model compression techniques like pruning and quantization have become increasingly vital. Despite their importance, these methods are typically developed in isolation, and while some libraries attempt to offer unified interfaces for compression, they often lack support for deployment tools such as hls4ml. To...

    Go to contribution page
  20. Olivia Weng
    02/09/2025, 13:40
    Standard Talk

    As neural networks (NNs) are increasingly used to provide
    edge intelligence, there is a growing need to make the edge devices
    that run them robust to faults. Edge devices must mitigate the resulting
    hardware failures while maintaining strict constraints on power, energy,
    latency, throughput, memory size, and computational resources. Edge
    NNs require fundamental changes in model...

    Go to contribution page
  21. Manuel Valentin (Northwestern University)
    02/09/2025, 14:00
    Standard Talk

    On-chip learning has the potential to unlock low-latency, low-power, and continuously adaptive AI directly on edge devices. However, research in this area remains limited by the lack of accessible hardware toolchains that support backpropagation. To address this gap, we propose ENABOL, a hardware-efficient extension of the HLS4ML toolchain that enables customizable backpropagation support...

    Go to contribution page
  22. Chang Sun (California Institute of Technology (US))
    02/09/2025, 14:20
    Standard Talk

    Neural networks with a latency requirement on the order of microseconds, like the ones used at the CERN Large Hadron Collider, are typically deployed on FPGAs pipelined with II=1. A bottleneck for the deployment of such neural networks is area utilization, which is directly related to the required constant matrix-vector multiplication (CMVM) operations. In this work, we propose an efficient...

    Go to contribution page
  23. Ioannis Xiotidis (CERN)
    02/09/2025, 14:40
    Standard Talk

    The ATLAS Level-0 Global Trigger is a mission critical system opting to take advantage of the full calorimeter granularity during Run-4 and beyond. Level-0 Global will be executing a cascade of trigger algorithms combined both the calorimeter information and the muons. Within the Next Generation Trigger project at CERN there is a dedicated work package (WP2.1) exploring large deployment of...

    Go to contribution page
  24. Kirsten Köbschall
    02/09/2025, 16:00
    Standard Talk

    In the era of continuous data generation, real-time processing of data streams has become crucial for timely, adaptive, and context-aware decision-making. However, maintaining effective learning models in such dynamic environments requires carefully balancing prediction performance, transparency and energy consumption.

    In the talk, we will present two new state-of-the-art methods for...

    Go to contribution page
  25. Alexander Redding (UC San Diego)
    02/09/2025, 16:20
    Standard Talk

    The widespread deployment of embedded ML systems has created a need for resilient, fault-tolerant hardware and software capable of operating in inherently noisy conditions. While the standardization of low-precision (≤ 8-bit) datatypes has allowed for reduced training and inference costs and increased interoperability across commercial accelerators, clear guidelines for robust implementation...

    Go to contribution page
  26. Yuan-Tang Chou (University of Washington (US))
    02/09/2025, 16:40
    Standard Talk

    The rising computational demands of increasing data rates and complex machine learning (ML) algorithms in large-scale scientific experiments have driven the adoption of the Services for Optimized Network Inference on Coprocessors (SONIC) framework. SONIC accelerates ML inference by offloading tasks to local or remote coprocessors, optimizing resource utilization. Its portability across diverse...

    Go to contribution page
  27. Serio Angelo Maria Agriesti (Department of Technology, Management and Economics, Technical University of Denmark, Lyngby, Denmark)
    02/09/2025, 17:00
    Standard Talk

    Most of the current machine learning (ML) applications are purely data-driven solutions with little considerations for underlying problem dynamics, limited to in-distribution applications. To tackle this limitation a stream of literature is emerging to address out-of-distribution (OOD) performance: Algorithmic alignment, which focuses on embedding algorithmic structures into ML architectures...

    Go to contribution page
  28. Dimitrios Danopoulos (CERN)
    02/09/2025, 17:20
    Standard Talk

    Matrix-vector (GEMV) operations are a common building block in many deep learning models, particularly for large dense layers found in convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs). Despite their importance, GEMV kernels have historically underperformed compared to matrix-matrix (GEMM) operations due to their lower arithmetic intensity and limited data reuse, making...

    Go to contribution page
  29. Adam Thompson (NVIDIA)
    03/09/2025, 09:00
    Invited Talks

    From radio telescopes to particle accelerators and electron microscopes, scientific instruments produce tremendous amounts of data at equally high rates; previous architectures that have relied on offline storage and large data transfers are unable to keep up. The future of scientific discovery is interactive, streaming, and AI driven, placing the autonomous and intelligent instrument at the...

    Go to contribution page
  30. Bozidar Radunovic (Microsoft Research)
    03/09/2025, 09:45
    Invited Talks
  31. Luigi Cruz (SETI)
    03/09/2025, 11:00

    As digitizer technologies scale, efficient processing of massive amounts of sensor data is essential for the next generation of science projects. This talk focuses on the next-generation electromagnetic signal processing pipeline developed at the Allen Telescope Array. Backed by the NVIDIA Holoscan SDK, this pipeline utilizes cutting-edge technologies to address the three key pillars of...

    Go to contribution page
  32. 03/09/2025, 11:45
  33. Sabrina Giorgetti (Universita e INFN, Padova (IT))
    03/09/2025, 13:00
    Standard Talk

    AXOL1TL is an anomaly detection (AD) trigger algorithm integrated into the Global Trigger (GT) of the CMS Level-1 Trigger (L1T) system since 2024. The GT reduces the event rate from proton–proton collisions at the LHC, lowering it from 40 MHz to 100 kHz within a fixed latency of 50 ns. The AD algorithm, implemented in the FPGA firmware of the GT board, uses an autoencoder to assign an anomaly...

    Go to contribution page
  34. Kenny Jia (Stanford University/ SLAC)
    03/09/2025, 13:20
    Contributed Talks
    Standard Talk

    The absence of BSM physics discoveries at the LHC suggests new physics could lie outside current trigger schemes. By applying unsupervised ML–based anomaly detection, we gain a model-agnostic way of spotting anomalous signatures that deviate from the current trigger’s expectations. Here we introduce a Run-3 trigger chain that embeds fast anomaly detection algorithms in both hardware and...

    Go to contribution page
  35. Christopher Edward Brown (CERN)
    03/09/2025, 13:40
    Standard Talk

    At the Phase-2 Upgrade of the CMS Level-1 Trigger (L1T), particles will be reconstructed by linking charged particle tracks with clusters in the calorimeters and muon tracks from the muon station. The 200 pileup interactions will be mitigated using primary vertex reconstruction for charged particles and a weighting for neutral particles based on the distribution of energy in a small area. Jets...

    Go to contribution page
  36. Deven Misra (University of Tokyo)
    03/09/2025, 14:00
    Standard Talk

    Belle II is a luminosity frontier experiment located at the SuperKEKB asymmetric $e^+ e^-$ collider, operating at the $\Upsilon(4S)$ resonance. The $\tau$ physics program at Belle II involves both probes of new physics and precision measurements of standard model parameters with large statistics. SuperKEKB is projected to reach a luminosity of $6\times 10^{35}~\text{cm}^{-2}\text{s}^{-1}$ in...

    Go to contribution page
  37. Davide Fiacco (Sapienza Universita e INFN, Roma I (IT))
    03/09/2025, 14:20
    Standard Talk

    The High Luminosity upgrade of the Large Hadron Collider (HL-LHC) presents a demanding environment for real-time data processing, with substantially increased event rates requiring faster and more efficient trigger systems. This study explores the deployment of graph neural networks (GNNs) on field-programmable gate arrays (FPGAs) for fast and accurate inference within future muon trigger...

    Go to contribution page
  38. Leon Bozianu (Universite de Geneve (CH))
    03/09/2025, 14:40
    Standard Talk

    The ATLAS trigger system will undergo a comprehensive upgrade in advance of the HL-LHC programme. In order to deal with the increased data bandwidth trigger algorithms will be required to satisfy stricter latency requirements. We propose a method to speed up the current calorimeter-only preselection step and to aid trigger decisions for hadronic signals containing jets.
    We demonstrate the use...

    Go to contribution page
  39. Duc Hoang (Massachusetts Inst. of Technology (US))
    03/09/2025, 16:00
    Standard Talk

    Optimized FPGA implementations of tiny neural networks are crucial for low-latency and hardware-efficient inference for a variety of applications. Neural networks based on lookup tables (LUTs) are a standard technique for such problems due to their hardware efficiency and strong expressivity. However, such networks are often difficult to scale up as their resource usage scales exponentially...

    Go to contribution page
  40. Eric Anton Moreno (Massachusetts Institute of Technology (US))
    03/09/2025, 16:20
    Standard Talk

    Modern foundation models (FMs) have pushed the frontiers of language, vision, and multi-model tasks by training ever-larger neural networks (NN) on unprecedented volumes of data. The use of FM models has yet to be established in collider physics, which both lack a comparably sized, general-purpose dataset on which to pre-train universal event representations, and a clear demonstrable need....

    Go to contribution page
  41. Jan-Frederik Schulte (Purdue University (US))
    03/09/2025, 16:40
    Standard Talk

    The analysis of point cloud data, for example signals from charged particles recorded by detectors in high energy physics (HEP) experiments, can be significantly enhanced and accelerated by the application of machine learning models. In recent years, transformer architectures have come into focus as offering excellent model performance. However, for traditional transformers,the need to compute...

    Go to contribution page
  42. Bo-Cheng Lai
    03/09/2025, 17:00
    Standard Talk

    The Interaction Network (IN) algorithm has shown great promise for particle tracking applications at the Large Hadron Collider (LHC), where identifying complex particle trajectories from raw detector data is a computationally intensive task. IN leverages graph-based representations of detector hits to learn relationships between particle interactions, making it well-suited for this domain....

    Go to contribution page
  43. Christina Reissel (Massachusetts Inst. of Technology (US)), Katya Govorkova (Massachusetts Inst. of Technology (US)), Philip Coleman Harris (Massachusetts Inst. of Technology (US))
    03/09/2025, 17:20
  44. 03/09/2025, 17:40
  45. Luca Benini (ETH Zurich)
    04/09/2025, 09:30
    Invited Talks

    AI is accelerating into the generative era, and it is poised to disrupt multiple businesses and applications. With the increasing focus on edge and extreme-edge, near sensor applications, inference is becoming the key workload and computational challenge. Computing system need to scale out and scale up to meet the challenge. In this talk I will discuss how to scale up chip(lets) for efficient...

    Go to contribution page
  46. Yaman Umuroglu (AMD)
    04/09/2025, 10:15
    Invited Talks

    Beyond the well-known highlights in computer vision and natural language, AI is steadily expanding into new application domains. This Pervasive AI trend requires supporting diverse and fast-moving application requirements, ranging from specialized I/O to fault tolerance and limited resources, all the while retaining high performance and low latency. Adaptive compute architectures such as AMD...

    Go to contribution page
  47. Giovanna Salvi (University of Michigan (US))
    04/09/2025, 11:30
    Standard Talk

    The trigger systems of ATLAS and CMS currently reject vast numbers of potentially valuable collision events due to their conservative, static designs, a limitation that directly hampers discovery potential. We propose an alternative to these rigid, hand-tuned menus with an autonomous controller capable of dynamically optimizing trigger performance in real time.
    In this work, we demonstrate...

    Go to contribution page
  48. Maximilian Heer (ETH Zurich)
    04/09/2025, 11:50
    Standard Talk

    Machine Learning (ML) techniques are increasingly applied for the optimization of complex computing systems, but their integration into core low-level system mechanisms remains limited. A key barrier is the lack of accessible, high- performance interfaces at the boundary between software and hardware as well as hardware-offloaded ML-inference at full systems speed. In this presentation, we...

    Go to contribution page
  49. Liv Helen Vage (Princeton University (US))
    04/09/2025, 12:10
    Standard Talk

    Tuning hyperparameters of ML models, especially large ML models, can be time consuming and computationally expensive. As a potential solution, several recent papers have explored hyperparameter transfer. Under certain conditions, the optimal hyperparameters of a small model are also optimal for larger models. One can therefore tune only the small model and transfer the hyperparameters to the...

    Go to contribution page
  50. Benedikt Maier (Imperial College (GB))
    04/09/2025, 12:30
  51. 04/09/2025, 12:40
  52. Zhiqiang (Walkie) Que (Imperial College London)
    04/09/2025, 13:30
    Standard Talk

    Graph Neural Networks (GNNs), particularly Interaction Networks (INs), have shown exceptional performance for jet tagging at the CERN High-Luminosity Large Hadron Collider. However, their computational complexity and irregular memory access patterns pose significant challenges for deployment on FPGAs in hardware trigger systems, where strict latency and resource constraints apply.

    In this...

    Go to contribution page
  53. Benjamin Weiss (Cornell University), Jannicke Pearkes (University of Colorado Boulder (US))
    04/09/2025, 13:50
    Standard Talk

    The Smartpixels project is a coordinated effort to co-design pixel ASICs, design tools, ML algorithms, and sensors for on-detector data reduction, motivated by the technical challenges of current and future colliders. The drive to greater precision requires smaller pixel pitch, which together with higher event rates arising from pileup and/or beam-induced background generates petabytes of data...

    Go to contribution page
  54. Ms Ema Puljak (universitat Autònoma de Barcelona)
    04/09/2025, 14:10
    Standard Talk

    We conduct a systematic study of quantum-inspired Tensor Network (TN) models—Matrix Product States (MPS) and Tree Tensor Networks (TTN)—for real-time jet tagging in high-energy physics, with a focus on low-latency deployment on FPGAs. Motivated by the strict computational demands of the HL-LHC Level-1 Trigger system, we explore TN architectures as compact and interpretable alternatives to deep...

    Go to contribution page
  55. Enrico Lupi (CERN, INFN Padova (IT))
    04/09/2025, 14:30
    Standard Talk

    Hadronic calorimeters are a key part of high energy physics experiments. Traditionally, they rely on high granularity to improve performances, but this leads to various challenges in terms of cost, energy consumption and output data volume. Moreover, current detectors do not have the capability of exploiting temporal information of the shower development, as the time frame for pattern...

    Go to contribution page
  56. Ho-Fung Tsoi (University of Pennsylvania)
    04/09/2025, 14:50
    Standard Talk

    Inference of standard convolutional neural networks (CNNs) on FPGAs often incurs high latency and long initiation intervals due to the nested loops required to slide filters across the full input, especially when the input dimensions are large. However, in some datasets, meaningful signals may occupy only a small fraction of the input, say sometimes just a few percent of the total pixels or...

    Go to contribution page
  57. Abdelrahman Asem Elabd (University of Washington (US))
    04/09/2025, 15:10
    Standard Talk

    Reflection High-Energy Electron Diffraction (RHEED) is a common diffraction-based surface characterization technique for analyzing the properties of crystalline materials that are grown using a thin-film deposition technique like pulsed-laser deposition (PLD) or molecular-beam epitaxy (MBE). In this work, we design an FPGA-accelerated machine learning (ML) algorithm to perform real-time...

    Go to contribution page
  58. Lauri Antti Olavi Laatu (Imperial College (GB))
    04/09/2025, 16:00
    Standard Talk

    Transformers are the state-of-the-art model architectures and widely used in application areas of machine learning. However the performance of such architectures is less well explored in the ultra-low latency domains where deployment on FPGAs or ASICs is required. Such domains include the trigger and data acquisition systems of the LHC experiments.

    We present a transformer-based algorithm...

    Go to contribution page
  59. Katya Govorkova (Massachusetts Inst. of Technology (US))
    04/09/2025, 16:20
    Standard Talk

    The LHCb Upgrade II will operate at a data rate of 200 Tb/s, requiring efficient real-time data reduction. A major challenge of this pipeline is the transfer of full timing information from the frontend Electromagnetic Calorimeter (ECAL) to the backend for processing, which is critical for resolving pile-up, background suppression, and enhancing energy resolution. Due to the data rate, full...

    Go to contribution page
  60. Jure Vreča
    04/09/2025, 16:40

    We give an introduction to chisel4ml, a tool for generating direct circuit implementations of deeply quantized neural networks. It uses structural descriptions of deeply quantized neural networks in the form of Chisel generators. Chisel is a domain-specific language for designing synchronous digital circuits. It is a language embedded in Scala that offers a wealth of powerful features, such...

    Go to contribution page
  61. Maciej Mikolaj Glowacki (CERN), Marius Köppel (ETH Zurich (CH))
    04/09/2025, 17:00
    Topical session

    We present an MLOps-based approach for managing the end-to-end lifecycle of machine learning (ML) algorithms deployed on FPGAs in real-time trigger systems, as used in experiments such as CMS and ATLAS. The primary objective of this pipeline is to enable agile and robust responses to evolving detector and beam conditions by automating the collection of new training data, retraining and...

    Go to contribution page
  62. Yaman Umuroglu
    04/09/2025, 17:00
    Topical session

    QONNX (Quantized ONNX) serves as a shared input representation and frontend for several efficient inference projects, including FINN, chisel4ml and NN2FPGA. This birds-of-a-feather session would serve as a gathering point for the community to discuss recent developments and future plans for QONNX.

    Go to contribution page
  63. Richard Stotz (Google Zurich)
    05/09/2025, 09:00
    Invited Talks

    Decision Forests such as Random Forests and Gradient Boosted Trees are an effective and widely used class of models for machine learning, particularly for tabular data and forecasting. This talk covers the practical use and ongoing research on Decision Forests at Google. We provide a brief overview of decision forest modeling with a focus on novel split conditions. We will analyze their impact...

    Go to contribution page
  64. Mathieu Guillame-Bert (Google Zurich)
    05/09/2025, 09:20
    Invited Talks

    Graph Neural Networks (GNNs) are a powerful paradigm for Neural Net ML models to operate on relational data or data with structural information. This talk explores the practical use and ongoing research on GNN done at Google for industrial applications. We provide a brief overview of GNNs modeling, including GCNs, Graph Transformers, and geometric-aware models. Then we discuss a variety of...

    Go to contribution page
  65. Isabel Haide (Karlsruhe Institute for Technology)
    05/09/2025, 10:15
    Standard Talk

    With increasing beam background levels at Belle II, which have already been observed due to the world-record instantaneous luminosities achieved by SuperKEKB and which are expected to rise further, an upgrade of the current Level 1 (L1) trigger algorithms is necessary to handle the evolving conditions. In this work, we present an upgraded L1 electromagnetic calorimeter trigger, based on Graph...

    Go to contribution page
  66. Mohamed Elashri (University of Cincinnati)
    05/09/2025, 10:35
    Standard Talk

    The PVFinder algorithm employs a hybrid deep neural network (DNN) approach to reconstruct primary vertices (PVs) in proton-proton collisions at the LHC, addressing the complexities of high pile-up environments in LHCb and ATLAS experiments. By integrating fully connected layers with a UNet architecture, PVFinder’s end-to-end tracks-to-hist DNN processes charged track parameters to predict PV...

    Go to contribution page
  67. Hannah Binney
    05/09/2025, 10:55
    Contributed Talks
    Standard Talk

    The Project 8 experiment aims to directly probe the neutrino mass by precisely measuring the energy spectrum of beta electrons emitted in the decay of tritium. The collaboration has pioneered the cyclotron radiation emission spectroscopy technique (CRES), which measures the energy of single electrons by detecting the cyclotron radiation they emit in a magnetic field. Traditional methods for...

    Go to contribution page
  68. 05/09/2025, 11:15
  69. Nhan Tran (Fermi National Accelerator Lab. (US))
    05/09/2025, 11:16
  70. Benjamin Ramhorst (ETH Zurich), Denis-Patrick Odagiu (ETH Zurich (CH)), Marius Köppel (ETH Zurich (CH))
    05/09/2025, 11:55
  71. Benjamin Ramhorst (ETH Zurich), Jan-Frederik Schulte (Purdue University (US))
    05/09/2025, 14:00

    For minutes of the discussion, see https://indico.cern.ch/event/1586270/

    Go to contribution page
  72. Felix Bachmair (Dectris Ltd.)
    Posters
    Poster

    Ptychographic imaging generates high-resolution datasets at the cost of heavy computational complexity, limiting its use in real-time experimental decision-making. In this cross-institutional effort, we introduce a hybrid edge-to-cloud workflow that delivers fast feedback for ptychography experiments by combining a modern synchrotron beamline at Diamond Light Source I13-1, featuring an...

    Go to contribution page
  73. Ameth Thiam
    1. Introduction and Context

    With the rise of cyberattacks and the growing volume of network traffic, intrusion detection systems (IDS) must provide fast, accurate, and resource-efficient analysis. Traditional CPU- or GPU-based solutions often struggle to meet low-latency and low-power requirements, especially in embedded environments.

    Integrating artificial intelligence, particularly...

    Go to contribution page
  74. N Ramakrishnan (Associate Professor, Monash University Malaysia)

    Quartz Crystal Microbalance (QCM) sensors are renowned for their high sensitivity to mass changes, making them ideal for detecting environmental parameters such as relative humidity (RH) and ultraviolet (UV) radiation. In this work, we present an AI-driven, dual-sided coated QCM sensor integrated with advanced machine learning (ML) and implemented on a real-time hardware platform. This sensor...

    Go to contribution page
  75. MUSTOFA ABDULHAFIZ AHMED mustofa

    Deploying ML models today requires deep expertise in both hardware and software optimization. It often involves laborious trial-and-error to determine the right combination of tools, techniques, and configurations. While industry and academia benefit from a wide array of deployment frameworks and automation tools, the High-Energy Physics (HEP) community still faces major challenges in adopting...

    Go to contribution page
  76. Jure Vreča

    Chisel4ml is a tool we developed for generating fast implementations of deeply quantized neural networks. The tool has a Python frontend and a Chisel backend. The Python frontend serves as an interface to the Python ecosystem for training neural networks. The Chisel backend consists of hardware generators written in the Chisel Hardware Construction Language. This is a language embedded in...

    Go to contribution page
  77. Benjamin Ramhorst (ETH Zurich), Gustavo Alonso (ETH Zurich), Maximilian Jakob Heer (ETH Zurich)
    Tutorials

    Authors:
    Gustavo Alonso, Maximilian Jakob Heer, Benjamin Ramhorst
    As Moore’s Law and Dennard Scaling reach their limits, computing is shifting toward heterogeneous hardware for large-scale data processing. Cloud vendors are deploying accelerators, like GPUs, DPUs, and FPGAs, to meet growing computational demands of ML and big data.

    While FPGAs offer great flexibility and performance,...

    Go to contribution page
  78. Dr Raja Selvam

    Chemical Vapor Deposition (CVD) optimization is critical for advancing thin-film quality and process efficiency in semiconductor and optoelectronic applications, yet traditional methods like Computational Fluid Dynamics (CFD) simulations and empirical tuning are often computationally intensive or lack adaptability. To address this challenge, this study presents a data-driven machine learning...

    Go to contribution page
  79. Mr Andrei Girjoaba (ETH Zurich)

    FPGAs are performant and flexible microchips well-suited for experimental physics that efficiently run anomaly detection algorithms and identify potential new physical phenomena. However, FPGAs are not easy to program: A significant gap exists between the algorithms used to discover new physics and the low-level hardware description languages (HDLs) required to program FPGAs. To tackle the...

    Go to contribution page
  80. Abhijith Gandrakota (Fermi National Accelerator Lab. (US))

    Transformers excel at modeling correlations in LHC collisions but incur high costs from quadratic attention. We analyze the Particle Transformer using attention maps and pair correlations on the (η,ϕ) plane, revealing that Particle Transformer attention maps learn traditional jet substructure observables. To improve efficiency we benchmark linear attention variants on JetClass and find that...

    Go to contribution page
  81. Davide Valsecchi (ETH Zurich (CH))

    Efficient data processing using machine learning relies on heterogeneous computing approaches, but optimizing input and output data movements remains a challenge. In GPU-based workflows data already resides on GPU memory, but machine learning models requires the input and output data to be provided in specific tensor format, often requiring unnecessary copying outside of the GPU device and...

    Go to contribution page
  82. Berk Turk (Middle East Technical University (TR))

    Alpha Magnetic Spectrometer (AMS-02) is a precision high-energy cosmic-ray experiment consisting of Transition Radiation Detector (TRD), Silicon Tracker, Magnet, Time of Flight (ToF), and Ring Imaging Cherenkov Detector (RICH), Anti-Coincidence Counter (ACC), and Electromagnetic Calorimeter (ECAL) on the ISS operating since 2011, and has collected more than 240 billion cosmic-ray events. Among...

    Go to contribution page
  83. David Degen (ETH Zurich, Queloz Group)

    Small ($R<4\,\mathrm{R}_{\oplus}$), long-period ($30\,\mathrm{days}<P$) exoplanets with low equilibrium temperatures are an extremely interesting population, promising insights into planet formation, atmospheric chemistry and evolution, as well as habitability. However, for these planets, the current observing strategy of NASA's Transiting Exoplanet Survey Satellite (TESS) can only capture...

    Go to contribution page
  84. Yuan-Tang Chou (University of Washington (US))

    With the increasing size of the machine learning (ML) model and vast datasets, the foundation model has transformed how we apply ML to solve real-world problems. Multimodal language models like chatGPT and Llama have expanded their capability to specialized tasks with common pre-train. Similarly, in high-energy physics (HEP), common tasks in the analysis face recurring challenges that demand...

    Go to contribution page
  85. Jovan Mitrevski (Fermi National Accelerator Lab. (US))

    Since version 1.0, hls4ml has provided a oneAPI backend for Altera FPGAs, as an evolution of the backend that targeted Intel HLS. Some design choices will be presented here, including the use of pipes and task sequences to develop a dataflow-style architecture. The oneAPI framework, unlike the Intel HLS framework, also naturally supports an accelerator-style deployment. Using always-run...

    Go to contribution page
  86. Maira Khan (Fermi National Accelerator Laboratory)

    We present the development of a machine learning (ML) based regulation system for third-order resonant beam extraction in the Mu2e experiment at Fermilab. Classical and ML-based controllers have been optimized using semi-analytic simulations and evaluated in terms of regulation performance and training efficiency. We compare several controller architectures and discuss the integration of...

    Go to contribution page
  87. RUKSHAK KAPOOR

    Medical imaging is foundational to clinical diagnostics and biomedical research, enabling the identification and monitoring of a wide range of conditions—from pulmonary diseases to cancer. However, the development of high-performance AI diagnostic systems is often hampered by restricted access to large, diverse, and well-annotated imaging datasets. This limitation is particularly acute for...

    Go to contribution page
  88. Sharvaree Vadgama (University of Amsterdam), Julia Balla (MIT), Ryley McConkey (MIT)

    Introduction
    Accurate climate prediction hinges on the ability to resolve multi-scale turbulent dynamics in the atmosphere and oceans [1]. An important mechanism of energy exchange between the ocean and the atmosphere is mesoscale turbulence, which contains motions of length scale $\mathcal{O}$(100 km). Two-layer quasi-geostrophic (QG) simulations [2] are a popular technique for...

    Go to contribution page
  89. Nicolo Ghielmetti (CERN), Yaman Umuroglu (AMD Research)

    The rising popularity of large language models (LLMs) has led to a growing demand for efficient model deployment. In this context, the combination of post-training quantization (PTQ) and low-precision floating-point formats such as FP4, FP6 and FP8 has emerged as an important technique, allowing for rapid and accurate quantization with the ability to capture outlier values in LLMs without...

    Go to contribution page
  90. Andrew Whitbeck (Fermi National Accelerator Lab. (US)), Ben Hawks (Fermi National Accelerator Lab)

    Modern development flows that use tooling for automated building, testing, and deployment of software are becoming the norm for large scale software and hardware projects. These flows offer quite a few advantages that make them desirable, but when attempting to implement them for projects that use FPGAs, some complications can arise when attempting to integrate them with traditional FPGA...

    Go to contribution page
  91. Lorenzo Asfour (ETH Zurich (CH))
    Standard Talk

    High-resolution electron microscopy generates large volumes of pixel detector data due to beam rates reaching $10^7$ to $10^{10}$ electrons per second directed at the sample. Of this data, only the electron entry point into the silicon detector prior to scattering is typically of interest for downstream analysis. Precise knowledge of these entry points is particularly important in electron...

    Go to contribution page
  92. Piero Viscone (CERN & University of Zurich (CH))

    In preparation for the High Luminosity LHC (HL-LHC) run, the CMS experiment is developing a major upgrade of its Level-1 (L1) Trigger system, which will integrate high-granularity calorimeter data and real-time tracking using FPGA-based processors connected via a high-bandwidth optical network. A central challenge is the identification of electrons in a high pileup environment within strict...

    Go to contribution page
  93. ATLAS Collaboration

    The High-Luminosity LHC (HL-LHC) will provide an order of magnitude increase in integrated luminosity and enhance the discovery reach for new phenomena. The increased pile-up foreseen during the HL-LHC necessitates major upgrades to the ATLAS detector and trigger. The Phase-II trigger will consist of two levels, a hardware-based Level-0 trigger and an Event Filter (EF) with tracking...

    Go to contribution page
  94. ATLAS TDAQ collaboration, Lucas Bezio (Universite de Geneve (CH))

    Deep Sets-based neural networks are well-suited to learning from unordered, variable-length inputs such as particle tracks associated with jets. Their permutation-invariant structure makes them attractive for high-energy physics (HEP) applications where input ordering is ambiguous and throughput is a critical constraint. In this work, we explore the use of such architectures on...

    Go to contribution page
  95. Jiahui Zhuo (Univ. of Valencia and CSIC (ES))

    The LHCb experiment at CERN operates a fully software-based first-level trigger that processes 30 million collision events per second, with a data throughput of 4 TB/s. Real-time tracking—reconstructing particle trajectories from raw detector hits—is essential for selecting the most interesting events, but must be performed under tight latency and throughput constraints.
    A key bottleneck in...

    Go to contribution page
  96. Abhishikth Mallampalli (University of Wisconsin Madison (US)), Lino Oscar Gerlach (Princeton University (US))
    Invited Talks
    Poster

    The CICADA (Calorimeter Image Convolutional Anomaly Detection Algorithm) project aims to detect anomalous physics signatures without bias from theoretical models in proton-proton collisions at the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider. CICADA identifies anomalies in low-level calorimeter trigger data using a convolutional autoencoder, whose behavior is...

    Go to contribution page
  97. Benjamin Ramhorst (ETH Zurich)
    Tutorials

    In this tutorial, you will get familiar with the hls4ml library. This library converts pre-trained Machine Learning models into FPGA firmware, targeting extreme low-latency inference. You will learn techniques for model compression, including how to reduce the footprint of your model using state-of-the-art techniques such as quantization. Finally, you will learn how to synthesize your model...

    Go to contribution page
  98. Erdem Yigit Ertorer (Carnegie-Mellon University (US))

    The Large Hadron Collider (LHC) will soon undergo a high-luminosity (HL) upgrade to improve future searches for new particles and to measure particle properties with increased precision. The upgrade is expected to provide a dataset ten times larger than the one currently available by the end of its data-taking period. The increased beam intensity will also increase the number of simultaneous...

    Go to contribution page
  99. Andrei Girjoaba

    As Moore’s Law comes to an end, domain-specific architectures (DSA) are considered the next direction for performance improvements in compute. Unfortunately, the development environment of DSAs falls short in comparison to that of general-purpose architectures (e.g., CPUs). The transition from general-purpose to DSA is hindered by the fact that software engineers lack the knowledge to...

    Go to contribution page
  100. Dr Amine Haboub (Qatar Environment and Energy Research Institute)

    The inverse design of photonic surfaces produced by high-throughput femtosecond laser processing is limited by a strongly non-linear, many-to-one mapping from laser parameters (power, speed, hatch spacing) to the resulting optical spectrum. Tandem Neural Networks (TNNs) mitigate this ill-posedness by pairing a forward surrogate with a separately trained inverse network, but they still rely on...

    Go to contribution page
  101. Yuan-Tang Chou (University of Washington (US))

    Charge particle track reconstruction is the foundation of the collider experiments. Yet, it's also the most computationally expensive part of the particle reconstruction. The innovation in tracking reconstruction using graph neural networks (GNNs) has demonstrated a promising capability to address the computing challenges posed by the High-Luminosity LHC (HL-LHC) with Machine learning....

    Go to contribution page
  102. Tommaso Baldi (Scuola Superiore Sant'Anna), Dr Tran Nhan (Fermi National Accelerator Laboratory, Batavia, IL, USA)

    In this paper, we propose a method to perform empirical analysis of the loss landscape of machine learning (ML) models. The method is applied to two ML models for scientific sensing, which necessitates quantization to be deployed and are subject to noise and perturbations due to experimental conditions.
    Our method allows assessing the robustness of ML models to such effects as a function of...

    Go to contribution page
  103. Ben Hawks (Fermi National Accelerator Lab)

    Benchmarks are a cornerstone of modern
    machine learning practice, providing standardized eval-
    uations that enable reproducibility, comparison, and
    scientific progress. Yet, as AI systems — particularly
    deep learning models — become increasingly dynamic,
    traditional static benchmarking approaches are losing
    their relevance. Models rapidly evolve in architecture,
    scale, and capability;...

    Go to contribution page
  104. Rajat Gupta (University of Pittsburgh (US))

    We present NomAD (Nanosecond Anomaly Detection), a real-time anomaly detection algorithm designed for the ATLAS Level-1 Topological (L1Topo) trigger using unsupervised machine learning. The algorithm combines a Variational Autoencoder (VAE) with Boosted Decision Tree (BDT) regression to compress and distill deep learning inference into a firmware-compatible format for FPGAs. Trained on 2024...

    Go to contribution page
  105. Gila Fruchter

    Recent advances in machine learning have raised ethical concerns in both industry and academia regarding the uncontrollable diffusion of AI and the diminishing human capacity to oversee its impacts. These concerns underscore the need for regulatory and design approaches that maintain human oversight in AI-driven decision-making. Keeping humans in the loop is essential for auditing, fairness,...

    Go to contribution page
  106. Sara Marques (UniBe)

    Understanding the diversity and structure of planetary systems requires capturing not only the properties of individual planets but also the statistical relationships between planets within the same system and their interaction with the host star.

    Traditional population synthesis models, such as the Bern model, provide physically motivated insights into these correlations, but their...

    Go to contribution page
  107. Hector Gutierrez Arance (Univ. of Valencia and CSIC (ES))

    The escalating demand for data processing in particle physics research has spurred the exploration of novel technologies to enhance the efficiency and speed of calculations. This study presents the development of an implementation of MADGRAPH, a widely used tool in particle collision simulations, to FPGA using High-Level Synthesis. This research presents a proof of concept limited to a single,...

    Go to contribution page
  108. Poster
  109. Siwar Jose Basualdo Garcia

    The accelerated retreat of tropical glaciers in the Peruvian Andes poses an imminent and catastrophic threat of Glacial Lake Outburst Floods (GLOF). These events can devastate downstream communities like Huaraz with warning times of less than 15 minutes [1]. Existing monitoring systems are inadequate for this challenge; optical satellite observations (e.g. Landsat, Sentinel-2) are frequently...

    Go to contribution page
  110. Mr Hao-Chun Liang (Institute of Pioneer Semiconductor Innovation, National Yang Ming Chiao Tung University)

    As the era of the High-Luminosity Large Hadron Collider (HL-LHC) approaches, the GPU-accelerated High-Level Trigger (HLT) of the CMS experiment faces a stringent requirement to reduce the Level-1 readout stream from 100 kHz to 5 kHz, a twenty-fold decrease essential to adhere to archival bandwidth constraints [[1][1]], [[2][2]]. Meeting this demand necessitates highly efficient real-time...

    Go to contribution page
  111. Pothuraju Naveen Yadav (Delhi Technological University)

    Simulating relativistic orbital dynamics around Schwarzschild black holes is essential for understanding general relativity and astrophysical phenomena like precession. Traditional numerical solvers face difficulty while dealing with noisy or sparse data, necessitating data-driven approaches. We develop a Scientific Machine Learning (SciML) framework to model orbital trajectories and...

    Go to contribution page
  112. Christina Reissel (Massachusetts Inst. of Technology (US)), Maira Khan (Fermi National Accelerator Laboratory)

    We investigate the application of state space models (SSMs) to a diverse set of scientific time series tasks. In particular, we benchmark the performance of SSMs against a set of baseline neural networks across three domains: magnet quench prediction, gravitational wave signal classification (LIGO), and neural phase estimation. Our analysis evaluates both computational efficiency—quantified by...

    Go to contribution page
  113. Tanguy Dietrich

    Cherenkov Telescope cameras stream about 1 Billion frames per seconds and are dominated by night-sky background, yet the γ-ray air-shower patterns of interest appear only occasionally.
    Filtering is thus paramount for guaranteeing that science-grade data are recorded without saturating the downstream read-out.
    In this work we present TDSCAN (Trigger Distributed Spatial Convolution Area...

    Go to contribution page
  114. PURABI MAZUMDAR. (Centre for Research in Biotechnology for Agriculture, Universiti Malaya, Kuala Lumpur, Malaysia)

    Pak choi (Brassica rapa subsp. chinensis) is a leafy green vegetable widely cultivated in vertical urban farming systems due to its rapid growth and high yield under compact, hydroponic setups. However, even in these controlled environments, crops remain susceptible to various diseases. Among the most common threats are fungal infections such as Alternaria leaf spot and powdery mildew, and...

    Go to contribution page
  115. João Paulo De Souza Böger

    Complex simulators are central to scientific research, forecasting, and real-world applications. However, they often require intensive computational resources and suffer from scalability issues — challenges amplified in the big data era. The APEX project addresses these limitations by designing novel efficient architectures for scientific simulators, exploring inductive biases, causal...

    Go to contribution page
  116. Olivia Dalager (Fermilab)

    Processing the large volumes of data produced by liquid argon time projection chamber (LArTPC) experiments presents a significant challenge, especially those at the scale of the Deep Underground Neutrino Experiment (DUNE). This is a particular challenge when aiming to trigger on low-energy neutrinos from core-collapse supernovae, which are typically buried in a high-rate radiological...

    Go to contribution page
  117. ATLAS Collaboration

    Graph Neural Networks (GNNs) have been in the focus of machine-learning-based track reconstruction for high-energy physics experiments during the last years. Within ATLAS, the GNN4ITk group has investigated this type of algorithm for track reconstruction at the High-Luminosity LHC (HL-LHC) using the future full-silicon Inner Tracker (ITk).

    The Event Filter (EF) is part of the ATLAS Trigger...

    Go to contribution page
  118. Ben Hawks (Fermi National Accelerator Lab)

    As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the
    development of advanced toolchains has significantly reduced the time required to iterate on various designs. These advancements have
    solved major obstacles, but also exposed new challenges. For example, processes that were not previously considered bottlenecks,...

    Go to contribution page
  119. Invited Talks
  120. Cilicia Uzziel Perez (La Salle, Ramon Llull University (ES))

    Graph Neural Networks (GNNs) have become promising candidates for particle reconstruction and identification in high-energy physics, but their computational complexity makes them challenging to deploy in real-time data processing pipelines. In the next-generation LHCb calorimeter, detector hits—characterized by energy, position, and timing—can be naturally encoded as node features, with...

    Go to contribution page