13–17 Feb 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

The Evolving Role of Monitoring in a Large-Scale Computing Facility

13 Feb 2006, 11:00
7h 10m
Tata Institute of Fundamental Research

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
poster Grid middleware and e-Infrastructure operation Poster

Speaker

Dr Tony Chan (BROOKHAVEN NATIONAL LAB)

Description

Monitoring a large-scale computing facility is evolving from a passive to a more active role in the LHC era, from monitoring the health, availability and performance of the facility to taking a more active and automated role in restoring availability, updating software and becoming a meta-scheduler for batch systems. This talk will discuss the experiences of the RHIC and ATLAS U.S. Tier 1 Computing Facility at Brookhaven National Lab in evaluating different monitoring software packages and how monitoring is being used to improve efficiency and to integrate the facility with the Grid environment. A monitoring model to link geographically dispersed, regional computer facilities which can be used to improve efficiency and throughput will be presented as well.

Primary authors

Mr Alex Withers (Brookhaven National Lab) Dr Bruce Gibbard (Brookhaven National Lab) Mr Chris Hollowell (Brookhaven National Lab) Mr Jason Smith (Brookhaven National Lab) Mr John Hover (Brookhaven National Lab) Dr Ofer Rind (Brookhaven National Lab) Mr Rob Petkus (Brookhaven National Lab) Dr Tom Throwe (Brookhaven National Lab) Dr Tony Chan (BROOKHAVEN NATIONAL LAB) Dr Xin Zhao (Brookhaven National Lab) Mrs Zhenping Liu (Brookhaven National Lab)

Presentation materials