Within the ATLAS detector, the Trigger and Data Acquisition system is responsible for the online processing of data streamed from the detector during collisions at the Large Hadron Collider at CERN. The online farm is comprised of ~4000 servers processing the data read out from ~100 million detector channels through multiple trigger levels. Configuring of these servers is not an easy task, especially since the detector itself is made up of multiple different sub-detectors, each with their own particular requirements.
The previous method of configuring these servers, using Quattor and a hierarchical scripts system was cumbersome and restrictive. A better, unified system was therefore required to simplify the tasks of the TDAQ Systems Administrators, for both the local and net booted systems, and to be able to fulfil the requirements of TDAQ, Detector Control Systems and the sub-detectors groups.
Various configuration management systems were evaluated, though in the end, Puppet was chosen as the application of choice and was the first such implementation at CERN. In this paper we describe the newly implemented system, detailing the redesign, the configuration and the use of the Puppet manifests to ensure a sane state of the entire farm.
|Primary Keyword (Mandatory)||Computing facilities|