ACAT 2025

Name: ACAT 2025
Start: 2025-09-08T08:00:00+02:00
End: 2025-09-12T16:30:00+02:00
Location: Hamburg, Germany

8–12 Sept 2025

Hamburg, Germany

Europe/Berlin timezone

Performance analysis of dynamically integrated HPC resources in the ATLAS workflow at the WLCG Tier-2 site in Freiburg

8 Sept 2025, 11:00

30m

ESA W 'West Wing'

Poster Track 1: Computing Technology for Physics Research Poster session with coffee break

Michael Boehler (University of Freiburg (DE))

At many Worldwide LHC Computing Grid (WLCG) sites, HPC resources are already integrated, or will be integrated in the near future, into the experiment specific workflows. The integration can be done either in an opportunistic way to use otherwise unused resources for a limited period of time, or in a permanent way. The WLCG ATLAS Tier-2 cluster in Freiburg has been extended in both ways: opportunistic use of resources from the NEMO HPC cluster in Freiburg and permanent use of the HoreKa HPC cluster at KIT.

In order to integrate the computing resources into the Tier-2 cluster in Freiburg in a manner that is both transparent and efficient, a container-based approach was adopted, utilising the meta-scheduler COBalD/TARDIS. TARDIS launches so-called drones on the HPC cluster, which provide the Tier-2 cluster with additional resources. To differentiate these augmented resources from their counterparts installed in Freiburg, the accounting is handled by the AUDITOR accounting ecosystem.

The compute hardware of the local Tier-2 cluster and the HPC cluster NEMO are largely identical and were replaced simultaneously. This facilitated a comprehensive analysis of the impact of various factors. Firstly, the difference in identical hardware from the bare-metal installation of a typical WLCG compute server was compared with drones on the HPC clusters. Furthermore, the influence of direct access from the Freiburg Tier-2 cluster and the Freiburg HPC cluster to the Freiburg-based storage, as well as remote access from Karlsruhe HoreKa, was analysed. Finally, the impact of varying drone sizes was investigated. These results will have a significant impact on the German HEP community's computing strategy for the next 5-10 years.

Significance

The integration of HPC resources into the workflow of LHC experiments has been an ongoing process. However, the transparent integration of such resources is particularly challenging due to the strict usage requirements of HPC centres, which typically do not permit privileged access. The aim of this contribution is twofold: firstly, to describe the transparent integration of HPC resources, and secondly, to take advantage of the unique opportunity to compare identical hardware in standard WLCG operation and on an HPC cluster with little spatial separation, as well as different hardware with medium spatial distance but very good network connectivity. The results of the analysis enable a more detailed examination of individual factors that contribute to the performance of dynamically integrated HPC resources. Consequently, these findings can be applied to the resource integrations of other computing sites in general.

Experiment context, if any	ATLAS

Michael Boehler (University of Freiburg (DE))

Dirk Sammel (University of Freiburg (DE)) Markus Schumacher (University of Freiburg (DE))

040_MBoehler_Performance_integrated_HPC_in_Freiburg.pdf

ACAT 2025

Performance analysis of dynamically integrated HPC resources in the ATLAS workflow at the WLCG Tier-2 site in Freiburg

ESA W 'West Wing'

Speaker

Description

Significance

Author

Co-authors

Presentation materials

Choose timezone

ACAT 2025

Speaker

Description

Significance

Author

Co-authors

Presentation materials