Speaker
Description
The EOS deployment at CERN is a core service used for both scientific data
processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX).
The collected disk failure metrics over a period of 1 year from a deployment
size of some 70k disks allows a first systematic analysis of the behaviour
of different hard disk types for the large CERN use-cases.
In this presentation we will describe the data collection and analysis,
summarise the measured rates and compare them with other large disk
deployments. In a second part of the presentation we will present a first
attempt to use the collected failure and SMART metrics to develop a machine
learning model predicting imminent failures and hence avoid service degradation
and repair costs.