Speaker
Description
The HEP communities have developed an increasing interest in High Performance Computing (HPC) centres, as these hold the potential of providing significant computing resources to the current and future experiments. At the Large Hadron Collider (LHC), the ATLAS and CMS experiments are challenged with a scale up of several factors in computing for the High Luminosity LHC (HL-HLC) Run 4, currently foreseen to begin by 2030. HPC platforms are not homogeneous, they expose a wide variation of environments, including proprietary software stacks, each with its own set of restrictions. The access barrier for the integration with the highly dynamic needs of the running HEP experiments remains high, and ad hoc solutions are typically needed in order to make use of such centres. The development of a common approach providing efficient utilisation of compute resources by abstracting the specifics of a particular machine is thus highly desirable. This work presents an integration technique developed for running ATLAS and CMS experiment computational workloads on the LUMI Supercomputer, and designed to be HPC centre agnostic. It leverages the capabilities of open source tools like the Advanced Resource Connector (ARC) middleware, CernVM-FS (CVMFS), SSH Filesystem (SSHFS) and common containerisation techniques, enhancing them with novel tools to overcome limitations of the container runtime provided by the HPC. The fapptainer [1] tool implements un-nesting of the containers, running them sideways instead, without any modification to the workflow of the jobs. The tools run unprivileged and as such do not require system modification by the local sysadmins. The proposed technique can be used to integrate any HPC system that has SSH inbound access and a standard container runtime available, and by means of an ARC Computing Element node close to it. A wide range of current and future HPC machines meets the specified requirements, thus enabling wider adoption of such tools by the HEP community to integrate HPC resources.
[1] Fapptainer software - https://source.coderefinery.org/slu/fapptainer
References
Fapptainer software - https://source.coderefinery.org/slu/fapptainer
Significance
The presented solution enables connecting HPC resources to HEP experiments distributed computing infrastructures as regular WLCG grid-sites and provides a generalised way of running computations independently of workload type and HPC environment specifics, using stock software components and a few in-house developed open-source tools. The tools presented are novel, and allow overcoming difficulties and limitations of the local HPC environments. They also operate in unprivileged fashion without requesting any modifications from the sysadmins.
Experiment context, if any | ATLAS and CMS experiments |
---|