21–27 Mar 2009
Prague
Europe/Prague timezone

An Assessment of a Model for Error Processing in the CMS Data Acquisition System

26 Mar 2009, 08:00
1h
Prague

Prague

Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
Board: Thursday 075
poster Online Computing Poster session

Speaker

Mr Roland Moser (CERN and Technical University of Vienna)

Description

The CMS Data Acquisition System consists of O(1000) of interdependent services. A monitoring system providing exception and application-specific data is essential for the operation of this cluster. Due to the number of involved services the amount of monitoring data is higher than a human operator can handle efficiently. Thus moving the expert-knowledge for error analysis from the operator to a dedicated system is a natural choice. This reduces the number of notifications to the operator for simpler visualization and provides meaningful error cause descriptions and suggestions for possible countermeasures. This paper discusses an architecture of a workflow-based hierarchical error analysis system based on Guardians for the CMS Data Acquisition System. Guardians provide a common interface for error analysis of a specific service or subsystem. To provide effective and complete error analysis, the requirements regarding information sources, monitoring and configuration, are analyzed. Formats for common notification types are defined and a generic Guardian based on Event-Condition-Action rules is presented as a proof-of-concept.

Authors

Dr Johannes Gutleber (CERN) Dr Luciano Orsini (CERN) Mr Roland Moser (CERN and Technical University of Vienna) Dr Schahram Dustdar (Technical University of Vienna)

Presentation materials