21-27 March 2009
Advances in Grid Operations

23 Mar 2009, 14:40
Prague Congress Centre 5. května 65, 140 00 Prague 4, Czech Republic
Ms Maite Barroso (CERN) Nicholas Thackray (CERN)


A review of the evolution of WLCG/EGEE grid operations Authors: Maria BARROSO, Diana BOSIO, David COLLADOS, Maria DIMOU, Antonio RETICO, John SHADE, Nick THACKRAY, Steve TRAYLEN, Romain WARTEL As the EGEE grid infrastructure continues to grow in size, complexity and usage, the task of ensuring the continued, uninterrupted availability of the grid services to the ever increasing number of user communities becomes more and more challenging. In addition, it is clear that these challenges will only increase with the significant ramp‐up, in 2009, of data taking at the Large Hadron Collider; the main experiments of which are, through the WLCG service, by far the largest users of the EGEE grid infrastructure. In this paper we discuss the ways in which the processes and tools of grid operations have been appraised and enhanced over the last 18 months in order to meet these challenges without any increase in the size of the team, while at the same time improving the overall level of service that the users experience when using the grid infrastructure. The improvements to the operations procedures and tools include: enhancements to the middleware lifecycle processes; improvements to operations communications channels (both to VOs and to sites); strategies to raise the availability and reliability of sites; improvements in the level of service supplied by the central grid operations tools; improvements to the robustness of core middleware services; enhancements to the handing of trouble ticket; sharing of best practices; and others. These points are then brought together to describe how the grid central operations team has learned valuable lessons through the day‐to‐day experience of operating the infrastructure and how operations has evolved as a result of this. In the last part of the paper, we will examine the future plans for further improvements in grid operations, including how we will deal with the unavoidable reduction in the level of effort available to for grid operations, as the funding for EGEE comes to an end in early 2010, just as the use of the grid by the LHC experiments will dramatically increase.
Ms Maite Barroso (CERN)

