Monitoring Evolution at CERN

Apr 13, 2015, 2:15 PM
Pedro Andrade (CERN)


Over the past two years, the operation of the CERN Data Centres went through significant changes with the introduction of new mechanisms for hardware procurement, new services for cloud infrastructure and configuration management, among other improvements. These changes resulted in an increase of resources being operated in a more dynamic environment. Today, the CERN Data Centres provide over 11000 multi-core processor servers, 130 PB disk servers, 100 PB tape robots, and 150 high performance tape drives. To cope with these developments, an evolution of the data centre monitoring tools was also required. This modernisation was based on a number of guiding rules: sustain the increase of resources, adapt to the new dynamic nature of the data centres, make monitoring data easier to share, give more flexibility to Service Managers on how they publish and consume monitoring metrics and logs, establish a common repository of monitoring data, optimize the handling of monitoring notifications, and replace the previous toolset by new open source technologies with large adoption and community support. This talk will explain how these improvements were delivered, present the architecture and technologies of the new monitoring tools, and review the experience of its production deployment.

Benjamin Fiorini (CERN) Luis Pigueiras (Universidad de Oviedo (ES)) Lukasz Starakiewicz (Technische Universitaet Muenchen (DE)) Miguel Coelho dos Santos (CERN) Susie Murphy (CERN)

