CHEP 07

Name: CHEP 07
Start: 2007-09-02T08:00:00+02:00
End: 2007-09-09T12:00:00+02:00
Location: Victoria, Canada

2–9 Sept 2007

Victoria, Canada

Europe/Zurich timezone

Please book accomodation as soon as possible.

Support

chep07-support@triumf.ca

Integrated RT-AT-Nagios system at BNL USATLAS Tier1 computing center

3 Sept 2007, 08:00

10h 10m

Victoria, Canada

Board: 87

poster Computer facilities, production grids and networking Poster 1

Tomasz Wlodek (Brookhaven National Laboratory)

Managing large number of heterogeneous grid servers with different service requirements posts great challenges. We describe a cost-effective integrated operation framework which manages hardware inventory, monitors services, raises alarms with different severity levels and tracks the facility response to them. The system is based on open source components: RT (Request Tracking) tracks user requests, AT (Asset Tracking) manages site inventory, while Nagios performs facility monitoring. We will discuss the integration of those components. The AT serves as central repository to store information about machines, services, groups of machines and services, their interdependencies and configuration. Problem reports sent to RT by users are reflected on asset history stored in AT database. Nagios system uses AT to obtain information about the components to be monitored. Detected problems are classified according to their severity, reported to experts and fed into RT system, where the progress towards their resolution is tracked. The paper will describe the AT data model, integration between AT and Nagios and interfacing the RT to other problem tracking systems. The described system provides a scalable solution to commission grid servers, automate the error-prone manual system configuration, and leverage the existing ticket system for problem tracking. It allows BNL to operate Tier1 facility 7X24, and meets service level agreements for each WLCG grid middleware component with different class of service requirements.

Tomasz Wlodek (Brookhaven National Laboratory)

Carlos Gamboa (Brookhaven National Laboratory) Dantong Yu (Brookhaven National Laboratory) Jason Smith (Brookhaven National Laboratory) Robert Petkus (Brookhaven National Laboratory) Shigeki Misawa (Brookhaven National Laboratory) Tom Throwe (Brookhaven National Laboratory) Yingzi Wu (Brookhaven National Laboratory) Zhenping Liu (Brookhaven National Laboratory)

There are no materials yet.

CHEP 07

Support

Integrated RT-AT-Nagios system at BNL USATLAS Tier1 computing center

Victoria, Canada

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

CHEP 07

Support

Speaker

Description

Author

Co-authors

Presentation materials