Speaker
Dr
Giuseppe Avolio
(CERN)
Description
The ATLAS experiment at the Large Hadron Collider at CERN relies on a complex and highly distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle
collision data obtained at unprecedented energy and rates. The TDAQ system is composed of a large number of hardware and software components (about 3000 machines and more than
15000 concurrent processes at the end of LHC's Run 1) which in a coordinated manner provide the data-taking functionality of the overall system.
The Run Control (RC) system is
the component steering the data acquisition by starting and stopping processes and by carrying all data-taking elements through well-defined states in a coherent way (finite
state machine pattern). The RC is organized as a hierarchical tree (run control tree) of run controllers following the functional de-composition into systems and sub-systems of
the ATLAS detector.
Given the size and complexity of the TDAQ system, errors and failures are bound to happen and must be dealt with. The data acquisition system has to recover
from these errors promptly and effectively, possibly without the need to stop data taking operations. In light of this crucial requirement and taking into account all the
lessons learnt during LHC's Run 1, the RC has been completely re-designed and re- implemented during the LHC Long Shutdown 1 (LS1) phase.
As a result of the new design, the RC
is assisted by the Central Hint and Information Processor (CHIP) service that can be truly considered its "brain". CHIP is an intelligent system having a global view on the TDAQ
system. It is based on a third party open source Complex Event Processing (CEP) engine, ESPER. CHIP supervises the ATLAS data taking, takes operational decisions and handles
abnormal conditions in a remarkable efficient and reliable manner. Furthermore CHIP automates complex procedures and performs advanced recoveries.
In this paper the design,
implementation and performances of the RC/CHIP system will be described. A particular emphasis will be put on the way the RC and CHIP cooperate and on the huge benefits brought
by the CEP engine. Additionally some error recovery scenarios will be analyzed for which the intervention of human experts is now rendered unnecessary.
Primary author
Dr
Fabrizio Salvatore
(University of Sussex (GB))