8โ€“12 Sept 2025
Hamburg, Germany
Europe/Berlin timezone

Abstracting heterogeneous resources in the ALICE Grid

Not scheduled
30m
Hamburg, Germany

Hamburg, Germany

Poster Track 1: Computing Technology for Physics Research Poster session with coffee break

Speaker

Maksim Melnik Storetvedt (Western Norway University of Applied Sciences (NO))

Description

With the emergence of increasingly complex workflows and data rates, accelerators have gained importance within ALICE and the Worldwide LHC Computing Grid (WLCG). Consequently, support for GPUs was added to JAliEn, the ALICE Grid middleware, in a transparent manner to automatically use these resources when available -- without breaking existing mechanisms for payload isolation and compatibility.

The above support has up to now been limited to the ALICE Event Processing Nodes (EPNs), as driver restrictions and hardware variations may prevent the pilot from enabling GPU support when execution environments stray too far from the current norm. Furthermore, even when enabled, each Grid payload is ultimately tailored to a specific GPU model, and necessitates additional optimizations when deployed on a different one. With the ever increasing amounts of data, and HL-LHC on the horizon, being able to offload GPU workflows to additional clusters in the ALICE Grid becomes a priority.

This contribution examines how GPU support can be extended to other computing sites in the ALICE Grid, such as the Perlmutter HPC at NERSC, in the context of being able to run ALICE reconstruction workflows -- expanding support beyond the existing ALICE EPN cluster.

Significance

This presentation introduces the ALICE middleware solution for managing heterogeneous resources within the ALICE Grid. It details how support for accelerators - such as GPUs - was seamlessly integrated into the ALICE Grid middleware layer, enabling transparent usage while ensuring that resource brokering prevents job and workload overlap. Furthermore, it explores how this support can be extended to accommodate less conventional resources, such as Cray supercomputers like the LBL Perlmutter - marking a departure from the GPU-centric environments of the ALICE EPNs.

Experiment context, if any A Large Ion Collider Experiment (ALICE)

Authors

Irakli Chakaberia (Lawrence Berkeley National Lab. (US)) Maksim Melnik Storetvedt (Western Norway University of Applied Sciences (NO))

Presentation materials

There are no materials yet.