Speaker
Description
At many Worldwide LHC Computing Grid (WLCG) sites, HPC resources are already integrated, or will be integrated in the near future, into the experiment specific workflows. The integration can be done either in an opportunistic way to use otherwise unused resources for a limited period of time, or in a permanent way. The WLCG ATLAS Tier-2 cluster in Freiburg has been extended in both ways: opportunistic use of resources from the NEMO HPC cluster in Freiburg and permanent use of the HoreKa HPC cluster at KIT.
In order to integrate the computing resources into the Tier-2 cluster in Freiburg in a manner that is both transparent and efficient, a container-based approach was adopted, utilising the meta-scheduler COBalD/TARDIS. TARDIS launches so-called drones on the HPC cluster, which provide the Tier-2 cluster with additional resources. To differentiate these augmented resources from their counterparts installed in Freiburg, the accounting is handled by the AUDITOR accounting ecosystem.
The compute hardware of the local Tier-2 cluster and the HPC cluster NEMO are largely identical and were replaced simultaneously. This facilitated a comprehensive analysis of the impact of various factors. Firstly, the difference in identical hardware from the bare-metal installation of a typical WLCG compute server was compared with drones on the HPC clusters. Furthermore, the influence of direct access from the Freiburg Tier-2 cluster and the Freiburg HPC cluster to the Freiburg-based storage, as well as remote access from Karlsruhe HoreKa, was analysed. Finally, the impact of varying drone sizes was investigated. These results will have a significant impact on the German HEP community's computing strategy for the next 5-10 years.
Significance
The integration of HPC resources into the workflow of LHC experiments has been an ongoing process. However, the transparent integration of such resources is particularly challenging due to the strict usage requirements of HPC centres, which typically do not permit privileged access. The aim of this contribution is twofold: firstly, to describe the transparent integration of HPC resources, and secondly, to take advantage of the unique opportunity to compare identical hardware in standard WLCG operation and on an HPC cluster with little spatial separation, as well as different hardware with medium spatial distance but very good network connectivity. The results of the analysis enable a more detailed examination of individual factors that contribute to the performance of dynamically integrated HPC resources. Consequently, these findings can be applied to the resource integrations of other computing sites in general.
Experiment context, if any | ATLAS |
---|