9–13 Jul 2018
Sofia, Bulgaria
Europe/Sofia timezone

Experience with Shifter Assistant: an intelligent tool to help operations of ATLAS TDAQ system in LHC Run 2

9 Jul 2018, 11:45
15m
Hall 3.1 (National Palace of Culture)

Hall 3.1

National Palace of Culture

presentation Track 1 - Online computing T1 - Online computing

Speaker

Andrei Kazarov (Petersburg Nuclear Physics Institut (RU))

Description

The Trigger and DAQ (TDAQ) system of the ATLAS experiment is a complex
distributed computing system, composed of O(30000) of applications
running on a farm of computers. The system is operated by a crew of
operators on shift. An important aspect of operations is to minimize
the downtime of the system caused by runtime failures, such as human
errors, unawareness, miscommunication, etc.

The paper describes recent developments in one of “intelligent” TDAQ
frameworks, the Shifter Assistant (SA) and summarizes the experience
of its use in operations of ATLAS in the course of LHC Run 2.

SA is a framework whose main aim is to automatize routine system
checks, error detection and diagnosis, events correlation etc. in
order to help the operators to react on runtime problems promptly and
effectively. The tool is based on CEP (Complex Event Processing)
technology. It constantly processes the stream of operational events
(O(100kHz)) over a set of “directives” (or rules) in the knowledge
base, producing human-oriented alerts and making shifters aware of
operational issues.

More then 200 directives were developed by TDAQ and detector experts
for different domains. In this paper we also describe different types
of directives and present examples of the most interesting ones,
demonstrating the power of CEP for this type of applications.

Primary authors

Jiri Masik (University of Manchester (GB)) Andrei Kazarov (Petersburg Nuclear Physics Institut (RU))

Presentation materials