Speaker
Massimo Lamanna
(CERN)
Description
In the last two years our team (operating CASTOR and EOS at CERN) invested a lot in monitoring our large disk-servers clusters. CASTOR and EOS disk-servers farms contain about 800 machines each and about 15 PB of usable disk space. Each of them produces logging information at a rate between 30 and 60 GB/day which are vital for monitoring, accounting, troubleshooting and disaster recovery.
The new system, named Cockpit, has been designed around the experience and the requirements of the operation team in close contact with the Agile monitoring working group.
In this presentation we will illustrate the present status of the project, its main components and usage and the lessons learnt. The system is in production for CASTOR for several months and under deployment for EOS.
Primary author
Massimo Lamanna
(CERN)