We have restored access to the website from outside the CERN network, however access from certain worldwide locations is still being blocked (read more here).

CERN Accelerating science

Talk
Title Applications of advanced data analysis and expert system technologies in ATLAS Trigger-DAQ Controls framework
Video
Loading
If you experience any problem watching the video, click the download button below
Download Embed
Mp4:High
(600 kbps)
Windows Media:Medium
(480 kbps)
Flash:High
(753 kbps)
High-resolution:
Copy-paste this code into your page:
Author(s) Avolio, Giuseppe (speaker) (University of California Irvine (US))
Corporate author(s) CERN. Geneva
Imprint 2012-05-21. - Streaming video, 00:31:54:00.
Series (Conferences)
(Computing in High Energy and Nuclear Physics (CHEP) 2012)
Lecture note on 2012-05-21T14:45:00
Subject category Conferences
Abstract The Trigger and DAQ (TDAQ) system of the ATLAS experiment is a very complex distributed computing system, composed of O(10000) of applications running on more than 2000 computers. The TDAQ Controls system has to guarantee the smooth and synchronous operations of all TDAQ components and has to provide the means to minimize the downtime of the system caused by runtime failures, which are inevitable for a system of such scale and complexity. During data taking runs, streams of information messages sent or published by TDAQ applications are the main sources of knowledge about correctness of running operations. The huge flow of operational monitoring data produced (with an average rate of O(1-10KHz)) is constantly monitored by experts to detect problem or misbehavior. Given the scale of the system and the rates of data to be analyzed, the automation of the Control system functionality in areas of operational monitoring, system verification, error detection and recovery is a strong requirement. It allows to reduce the operations man power needs and to assure a constant high quality of problem detection and following recovery. To accomplish its objective, the Controls system includes some high-level components which are based on advanced software technologies, namely the rule-based expert system (ES) and the complex event processing (CEP) engines. The chosen techniques allow to formalize, to store and to reuse the TDAQ experts' knowledge in the Control framework and thus to assist TDAQ shift crew to accomplish its task. DVS (Diagnostics and Verification System) and Online Recovery components are responsible for the automation of system testing and verification, diagnostics of failures and recovery procedures. These components are built on top of a common technology of a forward-chaining ES framework (based on CLIPS expert system shell), that allows to program the behavior of a system in terms of “if-then” rules and to easily extend or modify the knowledge base. The core of AAL (Automated monitoring and AnaLysis) component is a CEP (Complex Event Processing) engine implemented using ESPER in Java. The engine is loaded with a set of directives and it performs correlation and analysis of operational messages and events and produces operator-friendly alerts, assisting TDAQ operators to react promptly in case of problems or to perform important routine tasks. The component is known to shifters as "Shifter Assistant" (SA), and introduction of the SA allowed to reduce the number of shifters in the ATLAS control room. Design foresees a machine learning module to detect anomaly and problems that cannot be defined in advance. The described components are constantly used for the ATLAS Trigger-DAQ system operations, and the knowledge base is growing as more expertise is acquired. By the end of 2011 the size of the knowledge base used for TDAQ operations was about 300 rules. The paper presents the design and present implementation of the components and also the experience of its use in a real operational environment of the ATLAS experiment.
Copyright/License © 2012-2024 CERN
Submitted by jd@bnl.gov

 


 Record created 2012-07-09, last modified 2022-11-02


External links:
Download fulltextTalk details
Download fulltextEvent details