15–19 Sept 2025
CERN
Europe/Zurich timezone

ML Operations (MLOps) for FPGA based trigger implementations

16 Sept 2025, 12:05
5m
40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

95
Show room on map
5. Infrastructure for AI Deployment Infrastructure for AI Deployment

Speakers

Ioannis Xiotidis (CERN) Thorsten Wengler (CERN)

Description

The growing demand for fast and reliable Machine Learning (ML) inference in hardware triggers of High Energy Physics (HEP) experiments introduces new challenges in terms of model development, deployment, and long-term sustainability. This proposal aims to develop a generic, CERNwide ML Operations (MLOps) framework that enables end-to-end support for ML model lifecycles targeting Field Programmable Gate Array (FPGA)s. From raw data generation in frameworks such as Athena and CMSSW to model deployment on in-detector firmware, the system will focus on improving traceability, accelerating iterations on detector conditions. The framework will be modular, open to multiple experiments and toolchains (e.g. hls4ml, FINN, Vitis-AI), and prepared for scaling High Luminosity - LHC (HL-LHC) and beyond.

CERN group/ Experiment

CERN ATLAS Team

Working area Area 5: Infrastructure for AI Deployment
Project goals This proposal targets the creation of a coherent infrastructure, a fully traceable, modular MLOps framework tailored to ML-to-FPGA deployment in HEP trigger systems: - Establish an end-to-end pipeline - Standardise data conversion layers - Integrate meta-data tracking - Develop hooks for rapid model evaluation - Remain toolchain-agnostic: Support multiple back-ends - Support hardware-in-loop evaluation: Allow accelerated models to be validated and profiled directly in test-bed setups or emulation platforms. - Enable partial reconfiguration or selective retraining: Identify layers or models that can quickly absorb detector shifts without full retraining or synthesis.
Timeline Year 1: - ROOT-to-NPZ Conversion Tool - MLOps Data Pipeline Prototype - FPGA-Compatible Model Packaging Year 2: - Conversion/Tracing Integration - Alert System for ML Drift - Retrain-ability Evaluation Framework Year 3: - Bit-Accurate SIM-toHW Benchmarks - Final MLOps Deployment Toolkit - Transition to production system
Available person power 0.4 FTE
Additional person power request 36 GRAP months, 36 TECH monts
Is this an already ongoing activity? No
Indicative hardware resources needs Access to evaluation systems, either existing or to be obtained

Authors

Presentation materials

There are no materials yet.