Speaker
Description
To mitigate the rising energy demands and CO₂ emissions of large-scale scientific computing, this study evaluated energy-aware resource management at the WLCG PIC Tier-1 site. We initially used machine learning to optimize node drainage by routing short jobs during peak energy prices. However, this complex scheduling approach proved impractical, requiring constant retraining for fluctuating workloads.
Consequently, we identified real-time CPU frequency scaling as a superior alternative. Unlike complex scheduling, adjusting CPU frequency is straightforward, delivers immediate power reductions, and ensures controllable performance degradation. Node-level experiments validated this approach, quantifying the exact relationship between frequency, compute performance, and power draw. By estimating facility-wide savings based on these results, this research delivers a practical, scalable framework for automated, center-wide energy optimization at PIC and other WLCG sites.