Speaker
Description
An Artificial Intelligence (AI) model will spend “90% of its lifetime in inference.”To fully utilize co-
processors, such as FPGAs or GPUs, for AI inference requires O(10) CPU cores to feed to work to the
coprocessors. Traditional data analysis pipelines will not be able to effectively and efficiently use
the coprocessors to their full potential. To allow for distributed access to coprocessors for AI infer-
ence workloads, the LHC’s Compact Muon Solenoid (CMS) experiment has developed the concept
of Services for Optimized Network Inference on Coprocessors (SONIC) using NVIDIA’s Triton In-
ference Servers. We have extended this concept for the IceCube Neutrino Observatory by deploying
NVIDIA’s Triton Inference Servers in local and external Kubernetes clusters, integrating an NVIDIA
Triton Client with IceCube’s data analysis framework, and deploying an OAuth2-based HTTP au-
thentication service in front of the Triton Inference Servers. We will describe the setup and our
experience adding this to IceCube’s offline processing system.
Focus areas | MMA |
---|