24th International Conference on Computing in High Energy & Nuclear Physics

Name: 24th International Conference on Computing in High Energy & Nuclear Physics
Start: 2019-11-04T08:00:00+10:30
End: 2019-11-08T13:00:00+10:30
Location: Adelaide Convention Centre

4–8 Nov 2019

Adelaide Convention Centre

Australia/Adelaide timezone

Contact us

Anomaly detection using Unsupervised Machine Learning for Grid computing site operation

4 Nov 2019, 11:15

15m

Riverbank R7 (Adelaide Convention Centre)

Riverbank R7

Adelaide Convention Centre

Oral Track 7 – Facilities, Clouds and Containers Track 7 – Facilities, Clouds and Containers

Tomoe Kishimoto (University of Tokyo (JP))

A Grid computing site consists of various services including Grid middlewares, such as Computing Element, Storage Element and so on. Ensuring a safe and stable operation of the services is a key role of site administrators. Logs produced by the services provide useful information for understanding the status of the site. However, it is a time-consuming task for site administrators to monitor and analyze the service logs everyday. Therefore, a support framework (gridalert), which detects anomaly logs and alerts to site administrators, has been developed using Machine Learning techniques.

Typical classifications using Machine Learning require pre-defined labels. It is difficult to collect a large amount of anomaly logs to build a Machine Learning model that covers all possible pre-defined anomalies. Therefore, Unsupervised Machine Learning based on clustering algorithms is used in the gridalert to detect anomaly logs. Several clustering algorithms, such as k-means, DBSCAN and IsolationForest, and its parameters have been compared in order to maximize the performance of the anomaly detection for Grid computing site operations. The gridalert has been deployed to Tokyo Tier2 site, which is one of the Worldwide LHC Computing Gird sites, and is used in operation. In this presentation, studies about Machine Learning algorithms for the anomaly detection and our operational experiences of the gridalert will be reported.

Consider for promotion	No

Tomoe Kishimoto (University of Tokyo (JP)) Junichi Tanaka (University of Tokyo (JP)) Tetsuro Mashimo (University of Tokyo (JP)) Ryu Sawada (University of Tokyo (JP)) Koji Terashi (University of Tokyo (JP)) Michiru Kaneda (ICEPP, the University of Tokyo) Masahiko Saito (University of Tokyo (JP)) Nagataka Matsui (University of Tokyo (JP))

Tomoe_CHEP2019.pdf

24th International Conference on Computing in High Energy & Nuclear Physics

Contact us

Anomaly detection using Unsupervised Machine Learning for Grid computing site operation

Riverbank R7

Adelaide Convention Centre

Speaker

Description

Authors

Presentation materials