13 August 2019
CERN
Europe/Zurich timezone
There is a live webcast for this event.

EOS Winston: Expert Systems for Automated Diagnosis and Remediation

13 Aug 2019, 14:17
7m
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map

Speaker

Ishank Arora

Description

The IT-ST group at CERN runs and evaluates innovative cloud storage technologies for their application to big data problems in high-energy physics research. One of the entities it focuses on is EOS, the CERN multi-Petabyte disk-based storage service built from commodity hardware, heavily used as well by LHC and non-LHC experiments. The massive scale at which EOS runs leads to room for multiple issues and anomalies to creep in. These need to be dealt with in real-time to ensure smooth operations.

The project aims to improve the current troubleshooting and diagnosis of the different components that compose the EOS infrastructure with the development of an Expert System that collects diagnostic information such as metrics, signals, and alerts, from each of the namespaces and assists engineers in reducing the time to debug these issues on the system, thus automating some of the troubleshooting tasks.

Presentation materials