28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)

Name: 28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)
Start: 2026-05-25T08:00:00+07:00
End: 2026-05-29T14:00:00+07:00
Location: Chulalongkorn University

25–29 May 2026

Chulalongkorn University

Asia/Bangkok timezone

An End-to-End, Unified Workflow for Sub-Microsecond Inference on FPGAs

25 May 2026, 14:39

18m

MHMK M02

Oral Presentation Track 2 - Online and real-time computing Track 2 - Online and real-time computing

Dimitrios Danopoulos (CERN)

Real-time inference with sub-microsecond latency is critical for the Level-1 trigger systems at the High-Luminosity LHC. We present an end-to-end, open-source framework that spans model optimization, quantization, and FPGA deployment, enabling the translation of high-level neural network or generic dataflow models into resource-efficient FPGA implementations.

Within the workflow, we introduce High-Granularity Quantization (HGQ), a quantization framework that simultaneously optimizes the model's resource utilization and accuracy through quantization-aware training with differentiable bitwidths, all with native Keras-like training speeds. The framework supports both conventional matmul-based neural network architectures, ranging from classical dense operations to multi-head attention blocks, as well as fabric-native architectures that map efficiently to FPGA Look-Up Table (LUT) primitives. Users can freely use either architecture or combine both in a single model to achieve optimal trade-offs between accuracy, resource usage, and latency.

On the backend, we present da4ml, an HLS compiler that optimizes and converts unrolled static dataflow graphs, such as machine learning models for L1T, into RTL firmware in either Verilog or VHDL. Specifically, the framework can optimize constant-matrix-vector multiplication (CMVM) operations into efficient adder graphs, enabling DSP-free implementations for a wide range of models. The package also provides a compilation-free precise resource surrogate and bit-exact emulation of the compiled models via a C++ based interpreter, allowing for rapid design space exploration and model validation.

To facilitate adoption, the HGQ and da4ml packages are designed with user-friendly APIs that integrate seamlessly together. Furthermore, these packages can interface directly with hls4ml, allowing users to leverage the strengths of all three frameworks and utilize existing workflows without friction.

Chang Sun (California Institute of Technology (US))

Dimitrios Danopoulos (CERN) Maurizio Pierini (CERN)

main.pdf

28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)

An End-to-End, Unified Workflow for Sub-Microsecond Inference on FPGAs

MHMK M02

Speaker

Description

Author

Co-authors

Presentation materials