Speaker
Mr
Erekle Magradze
(Georg-August-Universitaet Goettingen (DE))
Description
High-throughput computing platforms consist of complex infrastructure and provide a number of services apt to failures. To mitigate the impact of failures on the quality of the provided services, a constant monitoring and in time reaction is required, which is impossible without automation of the system administration
processes. This paper introduces a way of automation of the process of monitoring information analysis to provide long and short term predictions of the service response time (SRT) of the mass storage and the batch systems and to identify the status of a service at a given time. The approach for the SRT predictions is based on Adaptive Neuro Fuzzy Inference System (ANFIS) while for a proper service status
identification the K-means clustering algorithm was employed. An evaluation of the approaches is performed on real monitoring data from the WLCG Tier 2 center GoeGrid. Ten fold cross validation results demonstrate high efficiency of both approaches in comparison to known methods.
Author
Mr
Erekle Magradze
(Georg-August-Universitaet Goettingen (DE))
Co-authors
Arnulf Quadt
(Georg-August-Universitaet Goettingen (DE))
Gen Kawamura
(Georg-August-Universitaet Goettingen (DE))
Haykuhi Musheghyan
(Georg-August-Universitaet Goettingen (DE))
Jordi Nadal Serrano
(Georg-August-Universitaet Goettingen (DE))