15–19 Sept 2025
CERN
Europe/Zurich timezone

Efficient Heterogeneous Machine Learning Inference

16 Sept 2025, 12:40
5m
40/S2-A01 - Salle Anderson (CERN)

40/S2-A01 - Salle Anderson

CERN

95
Show room on map
5. Infrastructure for AI Deployment Infrastructure for AI Deployment

Speaker

Lukasz Michalski (Wroclaw University of Science and Technology (PL))

Description

Current CMSSW workflows suffer from inefficient CPU-GPU data transfers when running machine learning models, leading to significant overhead. It can add up to several hundreds of milliseconds per event, which is a big issue, especially in real-time environments such as at trigger level. This reduces performance and scalability, making it harder to fully leverage ML in CMS operations.
Our project addresses this challenge by enabling models to directly access GPU-resident data without redundant copies. We will develop a user-friendly interface that integrates seamlessly with CMSSW’s Structure of Arrays (SoA) format, supports multiple ML model outputs, and scales across heterogeneous hardware backends through alpaka, allowing the inference to be executed on the device where data, which has been produced by previous heterogeneous algorithms, are located.

Key Benefits
- Performance and scalability: Eliminates costly memory transfers, accelerating ML inference in both online and offline workflows.
Ease of use: Simplifies ML integration by providing a standardized interface.
- Future readiness: Supports flexible model deployment on diverse and evolving hardware.
- Strategic alignment: Strengthens CERN’s investment in heterogeneous frameworks, enabling efficient use of diverse hardware.

CERN group/ Experiment

EP-CMG

Working area Area 5: Infrastructure for AI Deployment
Project goals Provide a user-friendly interface for efficient SoA-to-ML data handling. Enable robust model deployment pipelines with both testing and production-ready execution. Extend support to multiple GPU vendors (Nvidia and AMD). Optimize resource scheduling across heterogeneous devices with minimal overhead.
Timeline 3 years
Available person power Technical students finishing end of 2025
Additional person power request 1 ORIGIN
Is this an already ongoing activity? Yes
Indicative hardware resources needs Access to a GPU cluster with LCG-like software stack and cvmfs access with fast storage facilities across the full duration of the project

Authors

Felice Pantaleo (CERN) Christine Zeh (Vienna University of Technology (AT)) Lukasz Michalski (Wroclaw University of Science and Technology (PL))

Co-authors

Leonardo Beltrame (Politecnico di Milano (IT)) Eric Cano (CERN) Davide Valsecchi (ETH Zurich (CH))

Presentation materials