23–27 Sept 2024
Nikhef
Europe/Amsterdam timezone

Heterogeneous Tier2 Cluster and Power Efficiency Studies at ScotGrid Glasgow

27 Sept 2024, 12:00
20m
Colloquium room (Nikhef)

Colloquium room

Nikhef

Nikhef Science Park 105 1098 XG Amsterdam
HTCondor user presentations Workshop Session

Speaker

Emanuele Simili

Description

With the latest addition of 4k ARM cores, the ScotGrid Glasgow facility is a pioneering example of a heterogeneous WLCG Tier2 site. The new hardware has enabled large-scale testing by experiments and detailed investigations into ARM performance in a production environment.

I will present an overview of our computing cluster, which uses HTCondor as the batch system combined with ARC-CE as the front-end for job submission, authentication, and user mapping, with particular emphasis on the dual queue management. I will also touch on our monitoring and central logging system, built on Prometheus, Loki, and Grafana, and describe the custom scripts we use to extract job information from HTCondor and pass it to the node_exporter collector.

Moreover, I will highlight our research on power efficiency in HEP computing, showing the benchmarks and tools we use to measure and analyze power data. In particular, I will present a new figure-of-merit designed to characterize power usage during the execution of the HEP-Score benchmark, along with an updated performance-per-watt comparison extended to the latest x86 and ARM CPUs (Ampere Altra Q80 and M80, NVidia Grace, and recent AMD EPYC chips). Within this context, we introduce a Frequency Scan methodology to better characterize performance/watt trade-offs.

Desired slot length 15-20 minutes
Speaker release Yes

Author

Co-authors

Albert Gyorgy Borbely (University of Glasgow (GB)) David Britton (University of Glasgow (GB)) Gordon Stewart Samuel Cadellin Skipsey

Presentation materials