Speakers
Description
The development of the ecosystems for high energy physics analysis is experiencing a strong push towards the exploration of cloud-native frameworks, especially for what is considered the most interactive and plotting based “last-mile”. Along with the increasing adoption and R&D around ML-based algorithm, these are opening a request for ways to extend a Kubernetes cluster over a range of existing resources that, as matter of fact, are remote and most likely managed by a batch system of any sort (SLURM, HTCondor etc.). One of the main limitations for different models that try to address this issue is that many of the frameworks may rely on Kubernetes internal pod network to orchestrate and manage the workflows. When executed remotely (especially on a supercomputer) those containers, as per default of singularity/apptainer runtime, share the host network namespace and are only allowed to perform user-level operations, preventing the creation of any sort of network interfaces.
In this presentation, we will show our experience with offloading, with InterLink, a pod execution from production-level Kubernetes clusters to a EuroHPC center like Leonardo at CINECA. All of that while preserving the pod overlay network for workflow coordination tasks, leveraging Linux Kernel network namespaces. We will show how some of the adopted frameworks (like Ray, Kubeflow, Argo workflows) can leverage such a solution, expanding the range of possibilities for a distributed exploitation of non-Kubernetes resources through a Kubernetes orchestration.