25–29 May 2026
Chulalongkorn University
Asia/Bangkok timezone

Energy-aware compute resource modulation at the WLCG PIC Tier-1 site: drainage strategies, CPU frequency scaling, and predictive control

26 May 2026, 16:15
18m
Chulalongkorn University

Chulalongkorn University

Oral Presentation Track 7 - Computing infrastructure and sustainability Track 7 - Computing infrastructure and sustainability

Speaker

Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))

Description

The rapid growth in data centre energy demand poses significant challenges for the sustainability of large-scale scientific computing. In alignment with CERN and WLCG strategies on environmentally responsible computing, this work investigates methods to reduce energy consumption, electricity costs, and CO₂ emissions at the PIC WLCG Tier-1 site through energy-aware compute resource modulation.

Three complementary studies are presented. First, simulated natural job drainages were applied to real HTCondor utilisation data from 2023–2024 to assess the impact of temporarily halting job acceptance during periods of high electricity prices or carbon intensity. While this approach achieved limited economic and environmental gains, it resulted in disproportionate computational losses, primarily due to non-energy-aware scheduling, hardware heterogeneity, hyperthreading effects, and long job runtimes. These results highlight the limitations of naïve drainage strategies.

Second, dedicated experiments were conducted to evaluate the impact of dynamically adjusting CPU clock frequencies across PIC compute nodes. The study quantifies the relationship between CPU frequency, delivered compute performance, power consumption, and energy efficiency, demonstrating that frequency scaling can offer meaningful reductions in power draw and operational costs with controlled performance degradation. This enables finer-grained, node-level modulation of the compute farm compared to natural drainage strategies.

Finally, an XGBoost-based machine learning model was developed to predict CPU core availability following real-time drainage decisions using only information available at decision time. Trained on two years of site-specific HTCondor data, the model accurately forecasts core reductions, particularly in the 8–40 hour window after a drainage event, enabling proactive and informed resource management.

Together, these results provide actionable insights and practical tools for implementing energy-aware scheduling and control at PIC, and offer a scalable framework applicable to other WLCG sites pursuing sustainable computing operations.

Author

Jose Flix Molina (CIEMAT - Centro de Investigaciones Energéticas Medioambientales y Tec. (ES))

Co-authors

Carlos Acosta Silva (PIC) Gonzalo Merino (IFAE - Institute for High Energy Physics) Jordi Casals Hernandez (Port d'Informació Científica (PIC)) Ms Judith Murillo (UAB) Mr Kevin Fábrega (UAB)

Presentation materials

There are no materials yet.