Speaker
Description
To address this challenge and prepare for the transition to large, resource-intensive ML models, we propose leveraging AthenaTriton for DAOD production, where these ML models are executed on dedicated computing resources. AthenaTriton is a tool for running ML inference as a service in Athena using the NVIDIA Triton server software.We discuss different deployment strategies for Triton servers across heterogeneous computing platforms, including WLCG sites and High Performance Computing centers. We present the results of measurements of various performance metrics, including network transfer rate and latency, as well as event processing throughput. Finally, we evaluate the scalability of the AthenaTriton approach as a function of computing resources, enabling data-driven optimization of future DAOD workflows and ensuring sustainable, efficient large-scale ML inference across the evolving ATLAS computing infrastructure, which will increasingly rely on shared computing resources like those provided by the American Science Cloud.