Speaker
Description
Current CMSSW workflows suffer from inefficient CPU-GPU data transfers when running machine learning models, leading to significant overhead. It can add up to several hundreds of milliseconds per event, which is a big issue, especially in real-time environments such as at trigger level. This reduces performance and scalability, making it harder to fully leverage ML in CMS operations.
Our project addresses this challenge by enabling models to directly access GPU-resident data without redundant copies. We will develop a user-friendly interface that integrates seamlessly with CMSSW’s Structure of Arrays (SoA) format, supports multiple ML model outputs, and scales across heterogeneous hardware backends through alpaka, allowing the inference to be executed on the device where data, which has been produced by previous heterogeneous algorithms, are located.
Key Benefits
- Performance and scalability: Eliminates costly memory transfers, accelerating ML inference in both online and offline workflows.
Ease of use: Simplifies ML integration by providing a standardized interface.
- Future readiness: Supports flexible model deployment on diverse and evolving hardware.
- Strategic alignment: Strengthens CERN’s investment in heterogeneous frameworks, enabling efficient use of diverse hardware.
CERN group/ Experiment
EP-CMG
| Working area | Area 5: Infrastructure for AI Deployment |
|---|---|
| Project goals | Provide a user-friendly interface for efficient SoA-to-ML data handling. Enable robust model deployment pipelines with both testing and production-ready execution. Extend support to multiple GPU vendors (Nvidia and AMD). Optimize resource scheduling across heterogeneous devices with minimal overhead. |
| Timeline | 3 years |
| Available person power | Technical students finishing end of 2025 |
| Additional person power request | 1 ORIGIN |
| Is this an already ongoing activity? | Yes |
| Indicative hardware resources needs | Access to a GPU cluster with LCG-like software stack and cvmfs access with fast storage facilities across the full duration of the project |