Speaker
Description
We present an MLOps approach for managing the end-to-end lifecycle of machine learning algorithms deployed on FPGAs in the CMS Level-1 Trigger (L1T). The primary goal of the pipeline is to respond to evolving detector conditions by automatically acquiring up-to-date training data, retraining and re-optimising the model, validating performance, synthesising firmware, and deploying validated firmware into both online and offline environments.
In addition, we use the CMS Level-1 Scouting stream—which bypasses L1T selection—to detect drifts in the model output distribution. This enables us to quantify the operational lifetime of ML models deployed at the L1T and support continual learning strategies, such as triggering retraining or adjusting thresholds to maintain optimal performance.
The pipeline is built with CERN’s computing resources and integrates with the Kubeflow platform, GitLab CI/CD and the WLCG, offering a scalable solution for real-time ML deployment. This infrastructure lays the groundwork for rapid iteration and long-term sustainability of ML-based trigger algorithms—capabilities that will become increasingly important as ML continues to be adopted in changing, low-latency environments.
Experiment context, if any | CMS experiment |
---|