1–5 Sept 2014
Faculty of Civil Engineering
Europe/Prague timezone

Intelligent operations of the Data Acquisition system of the ATLAS Experiment at the LHC

2 Sept 2014, 08:00
1h
Faculty of Civil Engineering

Faculty of Civil Engineering

Faculty of Civil Engineering, Czech Technical University in Prague Thakurova 7/2077 Prague 166 29 Czech Republic
Board: 105
Poster Computing Technology for Physics Research Poster session

Speaker

Dr Giuseppe Avolio (CERN)

Description

The ATLAS experiment at the Large Hadron Collider at CERN relies on a complex and highly distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data obtained at unprecedented energy and rates. The TDAQ system is composed of a large number of hardware and software components (about 3000 machines and more than 15000 concurrent processes at the end of LHC's Run 1) which in a coordinated manner provide the data-taking functionality of the overall system. The Run Control (RC) system is the component steering the data acquisition by starting and stopping processes and by carrying all data-taking elements through well-defined states in a coherent way (finite state machine pattern). The RC is organized as a hierarchical tree (run control tree) of run controllers following the functional de-composition into systems and sub-systems of the ATLAS detector. Given the size and complexity of the TDAQ system, errors and failures are bound to happen and must be dealt with. The data acquisition system has to recover from these errors promptly and effectively, possibly without the need to stop data taking operations. In light of this crucial requirement and taking into account all the lessons learnt during LHC's Run 1, the RC has been completely re-designed and re- implemented during the LHC Long Shutdown 1 (LS1) phase. As a result of the new design, the RC is assisted by the Central Hint and Information Processor (CHIP) service that can be truly considered its "brain". CHIP is an intelligent system having a global view on the TDAQ system. It is based on a third party open source Complex Event Processing (CEP) engine, ESPER. CHIP supervises the ATLAS data taking, takes operational decisions and handles abnormal conditions in a remarkable efficient and reliable manner. Furthermore CHIP automates complex procedures and performs advanced recoveries. In this paper the design, implementation and performances of the RC/CHIP system will be described. A particular emphasis will be put on the way the RC and CHIP cooperate and on the huge benefits brought by the CEP engine. Additionally some error recovery scenarios will be analyzed for which the intervention of human experts is now rendered unnecessary.

Primary author

Dr Fabrizio Salvatore (University of Sussex (GB))

Presentation materials

There are no materials yet.

Peer reviewing

Paper