21–25 May 2012
New York City, NY, USA
US/Eastern timezone

Automated Inventory and Monitoring of the ALICE HLT Cluster Resources with the SysMES Framework

24 May 2012, 13:30
4h 45m
Rosenthal Pavilion (10th floor) (Kimmel Center)

Rosenthal Pavilion (10th floor)

Kimmel Center

Poster Online Computing (track 1) Poster Session

Speaker

Jochen Ulrich (Johann-Wolfgang-Goethe Univ. (DE))

Description

The High-Level-Trigger (HLT) cluster of the ALICE experiment is a computer cluster with about 200 nodes and 20 infrastructure machines. In its current state, the cluster consists of nearly 10 different configurations of nodes in terms of installed hardware, software and network structure. In such a heterogeneous environment with a distributed application, information about the actual configuration of the nodes is needed to automatically distribute and adjust the application accordingly. An inventory database provides a unified interface to such information. To be useful, the data in the inventory has to be up to date, complete and consistent with itself. Manual maintenance of such databases is error-prone and data tends to become outdated. The inventory module of the ALICE HLT cluster overcomes these drawbacks by automatically updating the actual state periodically and, in contrast to existing solutions, it allows the definition of a target state for each node. A target state can simply be a fully operational state, i.e. a state without malfunctions, or a dedicated configuration of the node. The target state is then compared to the actual state to detect deviations and malfunctions which could induce severe problems when running the application. The inventory module of the ALICE HLT cluster has been integrated into the monitoring and management framework SysMES in order to use existing functionality like transactionality, monitors and clients. Additionally, SysMES allows to solve detected problems automatically via its rule-system. To describe the heterogeneous environment with all its specifics, like custom hardware, the inventory module uses an object-oriented model which is based on the Common Information Model. To summarize, the inventory module provides an automatically updated actual state of the cluster, detects discrepances between the actual and the target state and is able to solve detected problems automatically. This contribution presents the current implementation state of the inventory module as well as the future development.

Primary author

Jochen Ulrich (Johann-Wolfgang-Goethe Univ. (DE))

Co-authors

Camilo Ernesto Lara Martinez (Johann-Wolfgang-Goethe Univ. (DE)) Prof. Dieter Roehrich (University of Bergen (NO)) Oystein Haaland (University of Bergen (NO)) Stefan Boettger (Kirchhoff-Institut fuer Physik (KIP)-Ruprecht-Karls-Universitaet) Prof. Udo Wolfgang Kebschull (Johann-Wolfgang-Goethe Univ. (DE))

Presentation materials