Speaker
Description
Recent studies on the ITk data showed that the Graph Neural Network (GNN) -based track finding can provide not only satisfied track efficiency but also reasonable track resolutions. However, the GNN-based track finding is computationally slow in CPUs, demanding the usage of coprocessors like GPUs to speed up the inference time. The large graph size, normally 300k nodes and 1M edges, necessitates significant GPU memory for feasible computation. Not all ATLAS computing sites are harnessed with high-end GPUs like A100s. These challenges have to be addressed in order to deploy the GNN-based track finding into production. We propose to address these challenges by establishing the GNN-based track-finding algorithm as a service hosted either in clouds or high-performance computing centers.
In this poster, we will describe the implementation of the GNN-based track-finding workflow as a service using the Nvidia Triton inference server. The pipeline contains three discrete deep-learning models and two CUDA-based algorithms. Because of the heterogeneity in the workflow, we explore different server settings to maximize the throughput of track finding. At the same time, we study the scalability of the inference server using the Perlmutter supercomputer at NERSC and cloud resources like AWS and Google Cloud. We will present the studies performed with the stand-alone algorithm. Integration and optimization of the workflows into ACTS and Athena are in progress.