Speaker
Description
The increasing use of GPUs and accelerator-based computing for simulation, reconstruction and machine learning has significantly expanded scientific capabilities in HEP. However, these workloads also introduce new challenges in terms of energy consumption, operational cost and overall carbon footprint, especially as computing demand grows with future experiments.
This contribution presents a metrics-driven framework designed to quantify and improve workload efficiency across both on-premises and cloud environments. Built on cloud-native observability components, the system defines a unified set of cost and carbon metrics, enabling consistent reporting and comparative analysis across heterogeneous infrastructures. These metrics transform operational data into actionable insights that support sustainable decision-making—from project-level resource planning to user-level workload optimization.
We demonstrate how HEP workflows leverage the framework to monitor performance, identify inefficiencies, and evaluate configuration alternatives. Early results show how standardized observability enables improved infrastructure utilization and provides transparent visibility into the environmental impact of scientific applications, thus contributing to a more sustainable computing strategy for CERN and the broader research community.