Speakers
Ms
Maite Barroso
(CERN)
Nicholas Thackray
(CERN)
Description
A review of the evolution of WLCG/EGEE grid operations
Authors: Maria BARROSO, Diana BOSIO, David COLLADOS, Maria DIMOU, Antonio RETICO, John SHADE, Nick THACKRAY, Steve TRAYLEN, Romain WARTEL
As the EGEE grid infrastructure continues to grow in size, complexity and usage, the task of ensuring the
continued, uninterrupted availability of the grid services to the ever increasing number of user communities becomes more and more challenging. In addition, it is clear that these challenges will only
increase with the significant ramp‐up, in 2009, of data taking at the Large Hadron Collider; the main experiments of which are, through the WLCG service, by far the largest users of the EGEE grid
infrastructure. In this paper we discuss the ways in which the processes and tools of grid operations have been appraised and enhanced over the last 18 months in order to meet these challenges without any
increase in the size of the team, while at the same time improving the overall level of service that the users experience when using the grid infrastructure. The improvements to the operations procedures and tools
include: enhancements to the middleware lifecycle processes; improvements to operations communications channels (both to VOs and to sites); strategies to raise the availability and reliability of
sites; improvements in the level of service supplied by the central grid operations tools; improvements to the robustness of core middleware services; enhancements to the handing of trouble ticket; sharing of best
practices; and others.
These points are then brought together to describe how the grid central operations team has learned valuable lessons through the day‐to‐day experience of operating the infrastructure and
how operations has evolved as a result of this. In the last part of the paper, we will examine the future plans for further improvements in grid operations, including how we will deal with the unavoidable
reduction in the level of effort available to for grid operations, as the funding for EGEE comes to an end in early 2010, just as the use of the grid by the LHC experiments will dramatically increase.
Presentation type (oral | poster) | oral |
---|
Primary author
Ms
Maite Barroso
(CERN)