Due to technical problems on the CERN SSO system, logging in is currently failing intermittently.
Apr 15 – 19, 2013
CNAF Bologna (Italy)
Europe/Rome timezone

Monitoring multi-PB disk farms at CERN for the HEP experiments: a service manager view

Apr 18, 2013, 2:10 PM
CNAF Bologna (Italy)

CNAF Bologna (Italy)

Presentation Storage & Filesystems Storage and filesystems


Massimo Lamanna (CERN)


In the last two years our team (operating CASTOR and EOS at CERN) invested a lot in monitoring our large disk-servers clusters. CASTOR and EOS disk-servers farms contain about 800 machines each and about 15 PB of usable disk space. Each of them produces logging information at a rate between 30 and 60 GB/day which are vital for monitoring, accounting, troubleshooting and disaster recovery. The new system, named Cockpit, has been designed around the experience and the requirements of the operation team in close contact with the Agile monitoring working group. In this presentation we will illustrate the present status of the project, its main components and usage and the lessons learnt. The system is in production for CASTOR for several months and under deployment for EOS.

Primary author

Presentation materials