19–25 Oct 2024
Europe/Zurich timezone

Evolution of the ATLAS TDAQ online software framework towards Phase-II upgrade: use of Kubernetes as an orchestrator of the ATLAS Event Filter computing farm

21 Oct 2024, 17:27
18m
Room 1.C (Small Hall)

Room 1.C (Small Hall)

Talk Track 2 - Online and real-time computing Parallel (Track 2)

Speaker

Alina Corso Radu (University of California Irvine (US))

Description

The ATLAS experiment at the Large Hadron Collider (LHC) at CERN continuously
evolves its Trigger and Data Acquisition (TDAQ) system to meet the challenges
of new physics goals and technological advancements. As ATLAS prepares for the
Phase-II Run 4 of the LHC, significant enhancements in the TDAQ Controls and
Configuration tools have been designed to ensure efficient data collection,
processing, and management. This abstract presents the evolution of ATLAS TDAQ
Controls and Configuration system leading up to Phase-II Run4. As part of the
evolution towards Phase-II, Kubernetes has been chosen to orchestrate the Event
Filter farm. By leveraging Kubernetes, ATLAS can dynamically allocate computing
resources, scale processing capacity in response to changing data taking
conditions, and ensure high availability of data processing services. The
integration of the Kubernetes with the TDAQ Run Control framework enables
perfect synchronisation between the experiment's data acquisition components
and the computing infrastructure. We will discuss the architectural
considerations and implementation challenges involved in Kubernetes integration
with the ATLAS TDAQ controls and configuration system. We will highlight the
benefits of using Kubernetes as an event filter farm orchestrator, including
improved resource utilization, enhanced fault tolerance, and simplified
deployment and management of data processing workflows. In addition, we will
report on the extensive testing of Kubernetes that was conducted using a farm
of 2500 servers within the experiment data taking environment, demonstrating
its scalability and robustness in handling the demands of the ATLAS TDAQ system
for Phase-II. The adoption of Kubernetes represents a significant step forward
in the evolution of ATLAS TDAQ controls and configuration system, aligning with
industry best practices in container orchestration and cloud-native computing.

Primary authors

ATLAS TDAQ Alina Corso Radu (University of California Irvine (US))

Presentation materials