ACAT 2016

Name: ACAT 2016
Start: 2016-01-18T08:00:00-03:00
End: 2016-01-22T18:00:00-03:00
Location: UTFSM, Valparaíso (Chile)

18–22 Jan 2016

UTFSM, Valparaíso (Chile)

Chile/Continental timezone

Secretary

acat2016@usm.cl

A scalable architecture for online anomaly detection of WLCG batch jobs

21 Jan 2016, 14:50

25m

UTFSM, Valparaíso (Chile)

Avenida España 1680, Valparaíso Chile

Oral Computing Technology for Physics Research Track 1

Manuel Giffels (KIT - Karlsruhe Institute of Technology (DE))

For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially configuration issues or misbehaving jobs preventing a smooth operation need to be detected as early as possible. At the GridKa Tier 1 centre we therefore operate a tool for monitoring traffic data and characteristics of WLCG jobs and pilots locally on different worker nodes. On the one hand local information itself are not sufficient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local misconfiguration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network. The contribution discusses different issues regarding the optimisation of computational costs, network overhead, and accuracy of anomaly detection. Based on simulations we will show the influence of different parameters, e.g. network size, location of computation, but also characteristics of WLCG batch jobs. The simulations are based on real batch job network traffic data that has been collected for several months.

Eileen Kuhn (KIT - Karlsruhe Institute of Technology (DE))

Andreas Petzold (KIT - Karlsruhe Institute of Technology (DE)) Christopher Jung Manuel Giffels (KIT - Karlsruhe Institute of Technology (DE)) Max Fischer (KIT - Karlsruhe Institute of Technology (DE))

20160121_Giffels_ScalableArchitecture.pdf

20160629_scalable_architecture.pdf

ACAT 2016

Secretary

A scalable architecture for online anomaly detection of WLCG batch jobs

UTFSM, Valparaíso (Chile)

Speaker

Description

Author

Co-authors

Presentation materials

Peer reviewing

Paper

Choose timezone

ACAT 2016

Secretary

Speaker

Description

Author

Co-authors

Presentation materials

Peer reviewing

Paper