Speaker
Description
Complexity and scale of systems increase rapidly and the amount of related
monitoring and accounting data grows accordingly.
Managing this vast amount of data is a challenge that CSCS solved by
introducing a Kubernetes cluster dedicated to dynamically deploying
data collection and analysis stacks comprising Elastic Stack, Kafka and
Grafana both for internal usage and for external customers' use cases.
This service proved to be crucial at CSCS to provide correlation of
events and meaningful insights from event-related data: bridging the
gap between the computation workload and resources status enables
failure diagnosis, telemetry and effective collection of accounting
data.
Currently at CSCS the main production Elastic Stack is handling more than
200B online documents. The integrated environment from data collection to
visualization let internal and external users produce their own powerful
dashboards and monitoring displays that are fundamental for their data
analysis needs.