Speaker
Description
The IT-ST group at CERN runs and evaluates innovative cloud storage technologies for their application to big data problems in high-energy physics research. One of the entities it focuses on is EOS, the CERN multi-Petabyte disk-based storage service built from commodity hardware, heavily used as well by LHC and non-LHC experiments. The massive scale at which EOS runs leads to room for multiple issues and anomalies to creep in. These need to be dealt with in real-time to ensure smooth operations.
The project aims to improve the current troubleshooting and diagnosis of the different components that compose the EOS infrastructure with the development of an Expert System that collects diagnostic information such as metrics, signals, and alerts, from each of the namespaces and assists engineers in reducing the time to debug these issues on the system, thus automating some of the troubleshooting tasks.