15–19 Apr 2013
CNAF Bologna (Italy)
Europe/Rome timezone

Monitoring multi-PB disk farms at CERN for the HEP experiments: a service manager view

18 Apr 2013, 14:10
25m
CNAF Bologna (Italy)

CNAF Bologna (Italy)

Presentation Storage & Filesystems Storage and filesystems

Speaker

Massimo Lamanna (CERN)

Description

In the last two years our team (operating CASTOR and EOS at CERN) invested a lot in monitoring our large disk-servers clusters. CASTOR and EOS disk-servers farms contain about 800 machines each and about 15 PB of usable disk space. Each of them produces logging information at a rate between 30 and 60 GB/day which are vital for monitoring, accounting, troubleshooting and disaster recovery. The new system, named Cockpit, has been designed around the experience and the requirements of the operation team in close contact with the Agile monitoring working group. In this presentation we will illustrate the present status of the project, its main components and usage and the lessons learnt. The system is in production for CASTOR for several months and under deployment for EOS.

Primary author

Presentation materials