Pablo Saiz (CERN)
The LHC experiments are going to start collecting data during the spring of 2009. The number of people and centers involved in such experiments sets a new record in the physics community. For instance, in CMS there are more than 3600 physicists, and more than 60 centers distributed all over the world. Managing such a big number of distributed sites and services is not a trivial task. Moreover, the definition of a proper behavior for a site strongly depends on the software model of each experiment. To make the situation even more difficult, the status of the sites changes dynamically. To be able to cope with such a large scale heterogeneous infrastructure, it is necessary to have monitoring tools providing complete and reliable view of the overall performance and status of the sites. The LHC experiments need to follow their computing activities at the sites and would like to make sure that the sites do provide required level of reliability and performance. The Site Status Board application has been developed in the Dashboard framework in order to monitor the status of the sites from the perspective of the LHC experiments or any other virtual organization.. The definition of the status is based on metrics defined by the Virtual Organization. Moreover, the Site Status Board keeps track of how the different metrics have been evolving over time. The Site Status Board is generic, and can be used by any Virtual Organization. At the moment, it is being used both by the commissioning activity and the Computing shifts in CMS. In the rest of this paper we will describe the details of the Site Status Board implementation and functionality, its use cases and the direction of the new developments.