AI RCS Strategy Workshop

Name: AI RCS Strategy Workshop
Start: 2025-09-15T08:09:00+02:00
End: 2025-09-19T18:00:00+02:00
Location: CERN

15–19 Sept 2025

CERN

Europe/Zurich timezone

Efficient Heterogeneous Machine Learning Inference

16 Sept 2025, 12:40

40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

Show room on map

5. Infrastructure for AI Deployment Infrastructure for AI Deployment

Lukasz Michalski (Wroclaw University of Science and Technology (PL))

Current CMSSW workflows suffer from inefficient CPU-GPU data transfers when running machine learning models, leading to significant overhead. It can add up to several hundreds of milliseconds per event, which is a big issue, especially in real-time environments such as at trigger level. This reduces performance and scalability, making it harder to fully leverage ML in CMS operations.
Our project addresses this challenge by enabling models to directly access GPU-resident data without redundant copies. We will develop a user-friendly interface that integrates seamlessly with CMSSW’s Structure of Arrays (SoA) format, supports multiple ML model outputs, and scales across heterogeneous hardware backends through alpaka, allowing the inference to be executed on the device where data, which has been produced by previous heterogeneous algorithms, are located.

Key Benefits
- Performance and scalability: Eliminates costly memory transfers, accelerating ML inference in both online and offline workflows.
Ease of use: Simplifies ML integration by providing a standardized interface.
- Future readiness: Supports flexible model deployment on diverse and evolving hardware.
- Strategic alignment: Strengthens CERN’s investment in heterogeneous frameworks, enabling efficient use of diverse hardware.

CERN group/ Experiment

EP-CMG

Working area	Area 5: Infrastructure for AI Deployment
Project goals	Provide a user-friendly interface for efficient SoA-to-ML data handling. Enable robust model deployment pipelines with both testing and production-ready execution. Extend support to multiple GPU vendors (Nvidia and AMD). Optimize resource scheduling across heterogeneous devices with minimal overhead.
Timeline	3 years
Available person power	Technical students finishing end of 2025
Additional person power request	1 ORIGIN
Is this an already ongoing activity?	Yes
Indicative hardware resources needs	Access to a GPU cluster with LCG-like software stack and cvmfs access with fast storage facilities across the full duration of the project

Felice Pantaleo (CERN) Christine Zeh (Vienna University of Technology (AT)) Lukasz Michalski (Wroclaw University of Science and Technology (PL))

Leonardo Beltrame (Politecnico di Milano (IT)) Eric Cano (CERN) Davide Valsecchi (ETH Zurich (CH))

Efficient Heterogeneous Machine Learning Inference.pdf

AI RCS Strategy Workshop

Efficient Heterogeneous Machine Learning Inference

40/S2-A01 - Salle Anderson

CERN

Speaker

Description

CERN group/ Experiment

Authors

Co-authors

Presentation materials