15–19 Sept 2008
Naxos - GREECE
Europe/Athens timezone

Infrastructures and monitoring of the on-line CMS computing center

18 Sept 2008, 16:15
2h
Naxos - GREECE

Naxos - GREECE

Speaker

Dr Attila RACZ (CERN)

Description

This paper describes in detail the infrastructures/installation of the CMS on-line computing center (CMSCC) and its associated monitoring system . In summer 2007, 640 readout Units/builder Units have been deployed along with ~150 servers for DAQ general services. Since summer 2008, ~500 filter units have been added and today, the CMSCC has an on-line processing capability sufficiant for a LV1A trigger rate of 50 kHz. To ensure that these ~1300 servers are performing the tasks we expect from them, a multi-level monitoring system has been put in place. This system is also described in this paper.

Summary

The on-line CMS computing center, located at the surface of the experimental site, performs the event assembly (640 event fragments produced by the detector are assembled into a single event of ~1MB) and subsequently, executes the high level trigger algorithms (HLTs) in order to select the events to be stored for later off-line analysis.
The heavy infrastructures (false floor, water ducts, racks, power rails) were installed in years 2005 and 2006. The cabling for the first batch of 800 servers (event builder PCs) started early 2007. The event builder PCs have been installed and commissioned in summer 2007. They are acting also as event analyzers as long as the data volume does not require dedicated PCs to run the HLT algorithms. About 500 servers for data analysis have been installed this summer in view of the LHC startup. Additional 1500 servers are foreseen
for 2009 to reach the full processing power.
The monitoring system is watching the servers at different levels : the first level is dealing with physical parameters (voltages, temperatures, fans) and maintenance/repair actions. The second level is monitoring the services provided by each server (ssh, tcp, presence of drivers, etc). The third level is looking at the application performances. Data retrieved by the three levels of monitoring are stored in a database.

Primary author

Dr Attila RACZ (CERN)

Presentation materials