Speakers
Mr
Gilles Mathieu
(IN2P3, Lyon)Ms
Helene Cordier
(IN2P3, Lyon)Mr
Piotr Nyczyk
(CERN)
Description
The paper reports on the evolution of operational model which was set up in the
"Enabling Grids for E-sciencE" (EGEE) project, and on the implications of Grid
Operations in LHC Computing Grid (LCG).
The primary tasks of Grid Operations cover monitoring of resources and services,
notification of failures to the relevant contacts and problem tracking through a
ticketing system. Moreover, an escalation procedure is enforced to urge the
responsible bodies to address and solve the problems. An extensive amount of
knowledge has been collected, documented and published in a way which facilitates a
rapid resolution to the common problems.
Initially, the daily operations were performed by only one person at CERN, but the
task soon required setting up a small team. The number of sites in production quickly
expanded from 60 to 170 in less than a year. The expansion of EGEE/LCG infrastructure
has led to distributed workload which involves more and more geographically scattered
teams.
The evolution of both procedures and workflow requires steady refinement of the tools
which consist of the ticketing system, knowledge database and integration platform
and which are used for monitoring and operations management.
Since EGEE/LCG production infrastructure relies on the availability of robust
operations mechanisms, it is essential to gradually improve the operational
procedures and to track the progress of the tools' on-going development.
Primary authors
Mr
Frederic Schaer
(IN2P3, Lyon)
Mr
Gilles Mathieu
(IN2P3, Lyon)
Ms
Helene Cordier
(IN2P3, Lyon)
Ms
Judit Novak
(CERN)
Mr
Markus Schulz
(CERN)
Mr
Min-Hong Tsai
(Academia Sinica, Taipei)
Mr
Piotr Nyczyk
(CERN)