In recent years containerization has revolutionized cloud environments, providing a secure, lightweight, standardized way to package and execute software. Solutions such as Kubernetes enable orchestration of containers in a cluster, including for the purpose of job scheduling. Kubernetes is becoming a de facto standard, available at all major cloud computing providers, and is gaining increased attention from some WLCG sites. In particular, CERN IT has integrated Kubernetes into their cloud infrastructure by providing an interface to instantly create Kubernetes clusters. Also, the University of Victoria is pursuing an infrastructure-as-code approach to deploying Kubernetes as a flexible and resilient platform for running services and delivering resources.
ATLAS has partnered with CERN IT and the University of Victoria to explore and demonstrate the feasibility of running an ATLAS computing site directly on Kubernetes, replacing all grid computing services. We have interfaced ATLAS’ workload submission engine PanDA with Kubernetes, to directly submit and monitor the status of containerized jobs. This paper will describe the integration and deployment details, and focus on the lessons learned from running a wide variety of ATLAS production payloads on Kubernetes using clusters of several thousand cores at CERN and the Tier 2 computing site in Victoria.
|Consider for promotion