28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)

Name: 28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)
Start: 2026-05-25T08:00:00+07:00
End: 2026-05-29T14:00:00+07:00
Location: Chulalongkorn University

25–29 May 2026

Chulalongkorn University

Asia/Bangkok timezone

An On-Grid deployment of ML Inference as a service at a Tier-2

26 May 2026, 16:51

18m

MHMK 202

Oral Presentation Track 4 - Distributed computing Track 4 - Distributed computing

Albert Gyorgy Borbely (University of Glasgow (GB))

Recent developments demonstrate that HEP software can run effectively on
GPUs, while advances in ML models have shown predictable scaling laws
for compute, data, and model size, consistent with trends across the
wider AI community. As a result, there is growing demand within HEP for
inference using larger models that have already delivered significant
physics gains, such as b-tagging in ATLAS with the GN2 transformer-based
neural network.

At present, ML inference in HEP is largely performed on CPUs using
translation libraries such as ONNX. However, a sharp rise in RAM
costs—driven by supply constraints and strong demand for HBM2
high-bandwidth memory—makes it increasingly unlikely that WLCG sites
will move far beyond the 2 GB per-job memory limit. In response, both
the ATLAS and CMS collaborations have proposed inference-as-a-service
solutions to simplify model deployment while addressing memory
constraints and rapidly growing model sizes.

One possible implementation is an on-Grid inference-as-a-service
deployment that uses site-local GPUs with the NVIDIA Triton inference
server and standard Grid tools, including ARC-CE, HTCondor, CVMFS, and
XCache. We describe progress on this approach at the Glasgow Tier-2 WLCG
site, along with tests involving the submission of Grid jobs. Reusing
underutilised GPU resources already available at Grid sites could offer
a pragmatic way to meet the increasing demand for this type of service.

Albert Gyorgy Borbely (University of Glasgow (GB))

David Britton (University of Glasgow (GB)) Emanuele Simili (University of Glasgow (GB)) Gordon Stewart Samuel Cadellin Skipsey

CHEP26_talk.pdf

28th Conference on Computing in High Energy and Nuclear Physics (CHEP 2026)

An On-Grid deployment of ML Inference as a service at a Tier-2

MHMK 202

Speaker

Description

Author

Co-authors

Presentation materials