Speaker
Description
In the exabyte era, physical science research infrastructures will have to deal with massive quantities of raw data by relying on large heterogeneous computing facilities. In the LHCb context, the ODISSEE project aims to maximize the computational performance and reliability of those systems while reducing the required energy and the total cost of ownership by using AI tools and techniques. By leveraging the massive historical dataset of the LHCb Data Centre, it is possible to develop methods for optimizing data center cooling and for dynamically distributing computational tasks according to load requirements. The same monitoring information can be used to train a predictor of potential failures and to design a Digital Twin of the Data Centre. In this contribution, we show how AI can improve operations, efficiency, and sustainability of large-scale computing infrastructures in modern high-throughput physics experiments.