Speaker
Description
Data Acquisition (DAQ) of the ATLAS experiment is a large distributed
and inhomogeneous system: it consists of thousands of interconnected
computers and electronics devices that operate coherently to read out
and select relevant physics data. Advanced diagnostics capabilities of
the TDAQ control system are a crucial feature which contributes
significantly to smooth operation and fast recovery in case of the
problems and, finally, to the high efficiency of the whole experiment.
The base layer of the verification and diagnostic functionality is a
test management framework. We have developed a flexible test
management system that allows the experts to define and configure
tests for different components, indicate follow-up actions to test
failures and describe inter-dependencies between DAQ or detector
elements. This development is based on the experience gained with the
previous test system that was used during the first three years of the
data taking. We discovered that experts in different domains or of
different components of the system must have more flexibility to configure
the verification and diagnostic capabilities of the controls framework,
such that later it is used in an automated manner.
In this paper we describe the design and implementation of the test
management system and also some aspects of its exploitation during the
ATLAS data taking in the LHC Run 2.