25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

Toward an IPv6-native, cloud-native ATLAS site: Scalable production-ready grid storage on Kubernetes

27 May 2026, 13:45
18m
Chulalongkorn University

Chulalongkorn University

Oral Presentation Track 7 - Computing infrastructure and sustainability Track 7 - Computing infrastructure and sustainability

Speaker

Ryan Taylor (University of Victoria (CA))

Description

The increasing computational scale and complexity of frontier scientific experiments, such as the ATLAS experiment at the Large Hadron Collider, continues to motivate a drive toward operational models that are resilient, automated, reproducible, and scalable. The University of Victoria (UVic) remains at the forefront of advancing cloud-native deployment patterns to address these challenges. Previous work established a cloud-native architecture for a complete ATLAS Tier 2 site on Kubernetes, including a functional prototype EOS storage element, but relied on a basic IPv4-only network design for the Kubernetes cluster. To overcome scalability and performance limitations associated with load balancing and software-defined routing in Openstack, and to satisfy ATLAS inter-site connectivity requirements, we designed a new cluster network architecture using direct-attached IPv6 addresses. We also improved performance, scalability, observability and robustness in the container network plane, and streamlined service routing, by switching to eBPF-based technology. Moreover, we migrated to an advanced load balancer capable of locality-aware address assignment, reducing latency and eliminating redundant lateral traffic flows within the cluster. Following these enhancements, we conduct an assessment of bandwidth scalability and benchmarks and demonstrate a significant performance optimization using the EOS shared filesystem redirection feature for direct CephFS access. Finally, we describe additional improvements to the EOS Helm chart, and the operational benefits of a fully containerized cloud-native deployment based on production experience.

Authors

ATLAS Computing Ryan Taylor (University of Victoria (CA))

Presentation materials

There are no materials yet.