25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

AIE4ML: Leveraging Versal AI Engines to Enable More Expressive Real-Time ML Models for Next-Generation Trigger Systems

25 May 2026, 14:57
18m
Chulalongkorn University

Chulalongkorn University

Oral Presentation Track 2 - Online and real-time computing Track 2 - Online and real-time computing

Speaker

Dimitrios Danopoulos (CERN)

Description

Modern particle-physics experiments increasingly rely on machine learning (ML) to perform real-time data reduction under the extreme conditions of the High-Luminosity LHC (HL-LHC). Hardware-trigger inference must satisfy microsecond-level latency, deterministic execution, and tight on-chip memory constraints. FPGA-based deployments can meet these requirements for small, highly parallelized models. However, scaling to deeper or wider architectures remains challenging due to resource limitations, manual design effort, and the lack of automated compilation flows. Frameworks such as hls4ml have enabled compact neural-network deployments in current trigger systems but also illustrate the challenge of supporting larger and more expressive models using conventional FPGA fabrics. In this work, we introduce AIE4ML, a compilation and optimization framework designed to seamlessly use the AI Engine arrays (AIE-ML/AIE-MLv2) of AMD Versal devices for low-latency ML inference. As part of ongoing Next-Generation Trigger (NGT) R&D efforts, AIE4ML extends the hls4ml ecosystem with support for the Versal AI Engines. These devices provide a particularly interesting architectural compromise between FPGA and GPU platforms as they offer a deterministic VLIW-SIMD architecture that allows compile-time (static) instruction scheduling and software-managed local memory. The AIE architecture is well matched to several classes of low-latency ML models of interest to HEP, including-but not limited to those which be expressed as structured collections of matrix–vector or matrix–matrix operations, including components of particle-flow networks, MLP-Mixers, and trigger-oriented classifiers. For demonstration, we evaluate quantized models imported from high-level frameworks, focusing on linear submodules (i.e., extracted from MLP-Mixer–style architectures), and we showcase throughput comparable to GPUs while respecting HL-LHC-scale latency budgets. Compared to FPGA implementations of similar models on large dense workloads, AIE4ML achieves order-of-magnitude performance gains, reaching up to a ~13× speed-up in some cases. This suggests that Versal AI Engines may enable more expressive and computationally intensive ML models in future real-time trigger systems.

Author

Co-authors

Enrico Lupi (CERN, University of Padova) Chang Sun (California Institute of Technology (US)) Roope Oskari Niemi Anastasiia Petrovych (CERN) Sebastian Dittmeier (Ruprecht-Karls-Universitaet Heidelberg (DE)) Michael Kagan (SLAC National Accelerator Laboratory (US)) Vladimir Loncar (University of Belgrade (RS)) Maurizio Pierini (CERN)

Presentation materials

There are no materials yet.