Speaker
Description
Machine Learning has been an important tool across experiments at the LHC, supporting tasks ranging from simulation and event reconstruction to anomaly detection and physics analysis. These applications demand inference paradigms that are not only efficient and low in latency but also seamlessly integrable into high-energy physics (HEP) workflows. While numerous frameworks exist for the development, training, and evaluation of ML models, deploying these models for inference at CERN remains a significant challenge—particularly due to the complexity of integrating them with existing HEP workflows. This is further amplified by the upcoming high-luminosity upgrades at CERN, which will lead to a significant increase in data generation rates, necessitating highly optimized inference solutions.
Within the ML4EP team at CERN, we have been investigating strategies for optimized ML inference and have developed SOFIE—a tool within ROOT/TMVA that translates trained ML models in ONNX format into an intermediate representation and then generates highly optimized C++ code for efficient, lightweight inference. The generated code has minimal external dependencies—only requiring BLAS—and includes algorithmic optimizations that ensure low latency. SOFIE also supports model import from popular training frameworks such as Keras and PyTorch, along with a Python interface for usability. Its flexible design allows for seamless integration into event-based workflows, enabling user-defined modifications and inference on real-time data streams. SOFIE currently supports a broad range of ML operations and offers extensibility for defining custom operations. It has demonstrated compatibility with complex architectures, such as Graph Neural Networks, by supporting inference from models trained using DeepMind’s Graph Nets library. Experimentations have shown efficient inference on models like ATLAS-GNNs, ParticleNet from CMS, and SmartPixels, among others.
We present the latest developments in SOFIE, including performance improvements and extended support for inference on heterogeneous architectures. Enhancements include memory planning using Structure-of-Arrays for efficient allocation and reuse, operator fusion with kernel-level optimizations to reduce data movement and further decrease latency. We also introduce SOFIE's new capabilities for heterogeneous inference through abstract code generation with SYCL and ALPAKA. This supports both NVIDIA (via cuBLAS) and AMD (via rocBLAS) platforms, offering flexibility and portability across GPU architectures. We present performance benchmarks comparing SOFIE against other inference frameworks, highlighting its effectiveness and adaptability.
Would you like to be considered for an oral presentation? | Yes |
---|