14–18 May 2018
University of Wisconsin-Madison
America/Chicago timezone

A fully High-availability logs/metrics collector @ CSCS

17 May 2018, 14:00
20m
Chamberlin Hall (University of Wisconsin-Madison)

Chamberlin Hall

University of Wisconsin-Madison

Madison, USA 43°4'25.8024''N 89°24'18.7776''W 43.073834, -89.405216
Basic IT Services Basic IT services

Speaker

Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre))

Description

As the complexity of systems increases and the scale of these systems increases, the amount of system level data recorded increases.
Managing the vast amounts of log data is a challenge that CSCS solved with the introduction of a centralized log and metrics infrastructure based on Elasticsearch, Graylog, Kibana, and Grafana.
This is a fundamental service at CSCS that provides easy correlation of events bridging the gap from the computation workload to nodes enabling failure diagnosis.
Currently, the Elasticsearch cluster at CSCS is handling more than 22'000'000'000 online documents (one year) and another 20'000'000'000 archived. The integrated environment from logging to graphical representation enables powerful dashboards and monitoring displays.

Desired length 20

Author

Mr Dino Conciatore (CSCS (Swiss National Supercomputing Centre))

Presentation materials