Nov 4 – 8, 2019
Adelaide Convention Centre
Australia/Adelaide timezone

Monitoring distributed computing beyond the traditional time-series histogram

Nov 5, 2019, 3:15 PM
15m
Riverbank R3 (Adelaide Convention Centre)

Riverbank R3

Adelaide Convention Centre

Oral Track 3 – Middleware and Distributed Computing Track 3 – Middleware and Distributed Computing

Speaker

Peter Love (Lancaster University (GB))

Description

In this work we review existing monitoring outputs and recommend some novel alternative approaches to improve the comprehension of large volumes of operations data that are produced in distributed computing. Current monitoring output is dominated by the pervasive use of time-series histograms showing the evolution of various metrics. These can quickly overwhelm or confuse the viewer due to the large number of similar looking plots. We propose a supplementary approach through the sonification of real-time data streamed directly from a variety of distributed computing services. The real-time nature of this method allows operations staff to quickly detect problems and identify that a problem is still ongoing, avoiding the case of investigating an issue a-priori when it may already have been resolved. In this paper we present details of the system architecture and provide a recipe for deployment suitable for both site and experiment teams.

Consider for promotion Yes

Primary authors

Peter Love (Lancaster University (GB)) Matthew Doidge (Lancaster University)

Presentation materials