25–29 Apr 2022
Europe/Zurich timezone

Anomaly Detection System for the CERN Cloud Monitoring

28 Apr 2022, 09:25
25m
Online workshop

Online workshop

Grid, Cloud & Virtualisation Grid, Cloud & Virtualisation

Speaker

Antonin Dvorak (Czech Academy of Sciences (CZ))

Description

As CERN cloud service managers, one of our tasks is to make sure that the desired computational power is delivered to all users of our scientific community. This task is accomplished by monitoring the utilization metrics of each hypervisor and reacting to alarms in case of server saturation to mitigate the interference between VMs.

In order to maximize the efficiency of our cloud infrastructure and to reduce the monitoring effort for service managers, we have developed an Anomaly Detection System that leverages unsupervised machine learning methods for time series metrics. Moreover, adopting ensemble strategies, we combine traditional and deep learning approaches.

This contribution presents the design of our Anomaly Detection system, the algorithms exploited and their performance in the daily operation of the CERN cloud. The analytics pipeline relies on open-source tools and frameworks adopted at CERN, such as pyOD, Tensorflow, Spark, Apache Airflow, Grafana, Elasticsearch.

Speaker release Yes

Authors

Antonin Dvorak (Czech Academy of Sciences (CZ)) Domenico Giordano (CERN) Stiven Metaj (Politecnico di Milano (IT))

Presentation materials