Speaker
Description
The rising computational demands of increasing data rates and complex machine learning (ML) algorithms in large-scale scientific experiments have driven the adoption of the Services for Optimized Network Inference on Coprocessors (SONIC) framework. SONIC accelerates ML inference by offloading tasks to local or remote coprocessors, optimizing resource utilization. Its portability across diverse hardware platforms improves data processing and model deployment efficiency in advanced research domains such as high-energy physics (HEP) and multi-messenger astrophysics (MMA). We developed SuperSONIC, a scalable server infrastructure for SONIC that enables the deployment of computationally intensive inference tasks, such as charged particle reconstruction, on Kubernetes clusters equipped with graphics processing units (GPUs). Leveraging NVIDIA’s Triton Inference Server, SuperSONIC decouples client workflows from server infrastructure, standardizing communication, improving throughput, and enabling robust load balancing and monitoring. SuperSonic has been successfully deployed in production environments, including the CMS and ATLAS experiments at CERN’s Large Hadron Collider, the IceCube Neutrino Observatory, and the LIGO gravitational-wave observatory. It offers a reusable, configurable framework that addresses cloud-native challenges and enhances the efficiency of accelerator-based inference across diverse scientific and industrial applications.