Speakers
Description
The growing demand for fast and reliable Machine Learning (ML) inference in hardware triggers of High Energy Physics (HEP) experiments introduces new challenges in terms of model development, deployment, and long-term sustainability. This proposal aims to develop a generic, CERNwide ML Operations (MLOps) framework that enables end-to-end support for ML model lifecycles targeting Field Programmable Gate Array (FPGA)s. From raw data generation in frameworks such as Athena and CMSSW to model deployment on in-detector firmware, the system will focus on improving traceability, accelerating iterations on detector conditions. The framework will be modular, open to multiple experiments and toolchains (e.g. hls4ml, FINN, Vitis-AI), and prepared for scaling High Luminosity - LHC (HL-LHC) and beyond.
CERN group/ Experiment
CERN ATLAS Team
| Working area | Area 5: Infrastructure for AI Deployment |
|---|---|
| Project goals | This proposal targets the creation of a coherent infrastructure, a fully traceable, modular MLOps framework tailored to ML-to-FPGA deployment in HEP trigger systems: - Establish an end-to-end pipeline - Standardise data conversion layers - Integrate meta-data tracking - Develop hooks for rapid model evaluation - Remain toolchain-agnostic: Support multiple back-ends - Support hardware-in-loop evaluation: Allow accelerated models to be validated and profiled directly in test-bed setups or emulation platforms. - Enable partial reconfiguration or selective retraining: Identify layers or models that can quickly absorb detector shifts without full retraining or synthesis. |
| Timeline | Year 1: - ROOT-to-NPZ Conversion Tool - MLOps Data Pipeline Prototype - FPGA-Compatible Model Packaging Year 2: - Conversion/Tracing Integration - Alert System for ML Drift - Retrain-ability Evaluation Framework Year 3: - Bit-Accurate SIM-toHW Benchmarks - Final MLOps Deployment Toolkit - Transition to production system |
| Available person power | 0.4 FTE |
| Additional person power request | 36 GRAP months, 36 TECH monts |
| Is this an already ongoing activity? | No |
| Indicative hardware resources needs | Access to evaluation systems, either existing or to be obtained |