Service Availability Monitor for the LHC experiments

Dr Alessandro Di Girolamo (CERN)

Describe the scientific/technical community and the scientific/technical activity using (planning to use) the EGEE infrastructure. A high-level description is needed (neither a detailed specialist report nor a list of references).

The four LHC experiment ALICE, ATLAS, CMS and LHCb depend on the EGEE grid to perform their scientific programme and all their computing activities preparing for the LHC collider startup (2008). The Service Availability Monitor (SAM) has been developed by SA1 primarily to test the EGEE infrastructure and to collect and to maintain the corresponding information.

Report on the experience (or the proposed activity). It would be very important to mention key services which are essential for the success of your activity on the EGEE infrastructure.

The integration activity has been slightly different for the four experiments, and we report on the following examples for each ones:
ALICE: VOBOX testing and installation
ATLAS and CMS: validation of the software installation and Storage Resource Manager (SRM) low-level testing
CMS: Calibration data base and local storage access from the worker nodes.
LHCb: software installation and validation integrated in the Dirac production and analysis system

We will report on the experiences from an application point of view of this integration, and on the improvements from the point of view of the operation of such large systems.

Describe the added value of the Grid for the scientific/technical activity you (plan to) do on the Grid. This should include the scale of the activity and of the potential user community and the relevance for other scientific or business applications

The four LHC experiments rely on the EGEE infrastructure to perform their simulation, reconstruction, analysis activities. These large-scale activities require a stable environment, not only for the fundamental services, like the storage services, but also for experiment specific services like the Software and Data Distribution services.
The integration between the Service Availability Monitoring and the applications specific frameworks is essential to achieve high efficiency in large-scale activities and to provide dependable services for large users community. In fact the SAM system is widely used in the EGEE operations to identify malfunctions in grid services but it can be adapted to perform the same function on experiment-specific services. This tool is also capable to accommodate application dependent tests, which is of great interest for large-scale applications relying on the grid and on application specific services.

