Speaker
Description
The ATLAS experiment at the Large Hadron Collider (LHC) at CERN continuously
evolves its Trigger and Data Acquisition (TDAQ) system to meet the challenges
of new physics goals and technological advancements. As ATLAS prepares for the
Phase-II Run 4 of the LHC, significant enhancements in the TDAQ Controls and
Configuration tools have been designed to ensure efficient data collection,
processing, and management. This abstract presents the evolution of ATLAS TDAQ
Controls and Configuration system leading up to Phase-II Run4. As part of the
evolution towards Phase-II, Kubernetes has been chosen to orchestrate the Event
Filter farm. By leveraging Kubernetes, ATLAS can dynamically allocate computing
resources, scale processing capacity in response to changing data taking
conditions, and ensure high availability of data processing services. The
integration of the Kubernetes with the TDAQ Run Control framework enables
perfect synchronisation between the experiment's data acquisition components
and the computing infrastructure. We will discuss the architectural
considerations and implementation challenges involved in Kubernetes integration
with the ATLAS TDAQ controls and configuration system. We will highlight the
benefits of using Kubernetes as an event filter farm orchestrator, including
improved resource utilization, enhanced fault tolerance, and simplified
deployment and management of data processing workflows. In addition, we will
report on the extensive testing of Kubernetes that was conducted using a farm
of 2500 servers within the experiment data taking environment, demonstrating
its scalability and robustness in handling the demands of the ATLAS TDAQ system
for Phase-II. The adoption of Kubernetes represents a significant step forward
in the evolution of ATLAS TDAQ controls and configuration system, aligning with
industry best practices in container orchestration and cloud-native computing.