Sep 2 – 9, 2007
Victoria, Canada
Europe/Zurich timezone
Please book accomodation as soon as possible.

Integrated RT-AT-Nagios system at BNL USATLAS Tier1 computing center

Sep 3, 2007, 8:00 AM
10h 10m
Victoria, Canada

Victoria, Canada

Board: 87
poster Computer facilities, production grids and networking Poster 1

Speaker

Tomasz Wlodek (Brookhaven National Laboratory)

Description

Managing large number of heterogeneous grid servers with different service requirements posts great challenges. We describe a cost-effective integrated operation framework which manages hardware inventory, monitors services, raises alarms with different severity levels and tracks the facility response to them. The system is based on open source components: RT (Request Tracking) tracks user requests, AT (Asset Tracking) manages site inventory, while Nagios performs facility monitoring. We will discuss the integration of those components. The AT serves as central repository to store information about machines, services, groups of machines and services, their interdependencies and configuration. Problem reports sent to RT by users are reflected on asset history stored in AT database. Nagios system uses AT to obtain information about the components to be monitored. Detected problems are classified according to their severity, reported to experts and fed into RT system, where the progress towards their resolution is tracked. The paper will describe the AT data model, integration between AT and Nagios and interfacing the RT to other problem tracking systems. The described system provides a scalable solution to commission grid servers, automate the error-prone manual system configuration, and leverage the existing ticket system for problem tracking. It allows BNL to operate Tier1 facility 7X24, and meets service level agreements for each WLCG grid middleware component with different class of service requirements.

Primary author

Tomasz Wlodek (Brookhaven National Laboratory)

Co-authors

Carlos Gamboa (Brookhaven National Laboratory) Dantong Yu (Brookhaven National Laboratory) Jason Smith (Brookhaven National Laboratory) Robert Petkus (Brookhaven National Laboratory) Shigeki Misawa (Brookhaven National Laboratory) Tom Throwe (Brookhaven National Laboratory) Yingzi Wu (Brookhaven National Laboratory) Zhenping Liu (Brookhaven National Laboratory)

Presentation materials

There are no materials yet.