Speaker
Description
Increasingly intensive AI and simulation workloads are driving thermal stress across large-scale HPC environments. As compute centres prepare for the next performance phase, conventional optimisation practices no longer align with ESG targets or hardware lifecycle requirements. This contribution presents a proven infrastructure-level methodology for energy-aware runtime orchestration that improves thermal stability without compromising computational output.
The approach applies adaptive optimisation in real time based on inference from system and workload behaviour. Evaluation across industrial-scale AI and simulation environments demonstrated measurable energy reductions within expected performance tolerances, stabilisation of thermal distribution, and improved lifecycle consistency. Results were achieved without modifying existing cluster architecture or introducing measurable latency.
Unlike traditional static tuning models, the method dynamically aligns system state to workload and environmental conditions, providing sustained efficiency without reliance on vendor-specific implementations. It offers a practical path towards ESG-aligned high-performance computing at scale and supports future integration of workload intelligence to strengthen resilience under increasing operational pressure.