Speaker
Description
The Operational Intelligence (OpInt) project is a joint effort from
various WLCG communities aimed at increasing the level of automation
in computing operations and reducing human interventions. The currently deployed systems have proven to be mature and capable of meeting the experiments goals, by allowing timely delivery of scientific results. However, a substantial number of interventions from software developers, shifters and operational teams is needed to manage efficiently such heterogeneous infrastructures.
Under the scope of the OpInt project, experts from most of the relevant areas
have gathered to propose and work on “smart” solutions. Machine learning,
data mining, log analysis, and anomaly detection are only some of the tools we
have evaluated for our use cases . Discussions have led to a number of ideas on
how to achieve our goals and the development of solutions has started. In this
contribution, we will report on the development of a suite of OpInt services to
cover various use cases of: workload management, data management, and site
operations.