18–22 Jan 2016
UTFSM, Valparaíso (Chile)
Chile/Continental timezone

Cluster Optimization with Evaluation of Memory and CPU usage via Cgroups of ATLAS & GridPP workloads running at a Tier-2

18 Jan 2016, 16:10
25m
UTFSM, Valparaíso (Chile)

UTFSM, Valparaíso (Chile)

Avenida España 1680, Valparaíso Chile
Oral Computing Technology for Physics Research Track 1

Speaker

Prof. David Britton (University of Glasgow)

Description

Modern Linux Kernels include a feature set that enables the control and monitoring of system resources, called Cgroups. Cgroups have been enabled on a production HTCondor pool located at the Glasgow site of the UKI-SCOTGRID distributed Tier-2. A system has been put in place to collect and aggregate metrics extracted from Cgroups on all worker nodes within the Condor pool. From this aggregated data, memory and CPU usage footprints are extracted. From the extracted footprints the resource usage for each type of ATLAS and GridPP workload can be obtained and studied. This system has been used to identify broken payloads, real-world memory usage, job efficiencies etc. The system has been running in production for 1 year and a large amount of data has been collected. From these statistics we can see the difference between the original memory requested and the real world memory usage of different types of jobs. These results were used to reduce the amount of memory requested (for scheduling purposes) from the batch system and an increase in cluster utilisation was observed, at around the 10% level. By analysing the overall real world job performance we have been able to increase the utilisation of the Glasgow site of the UKI-SCOTGRID distributed Tier-2.

Primary author

Gang Qin (University of Glasgow (GB))

Co-authors

David Britton (University of Glasgow (GB)) David Crooks (University of Glasgow (GB)) Gareth Douglas Roy (University of Glasgow (GB)) Dr Gordon STEWART (University of Glasgow) Samuel Cadellin Skipsey

Presentation materials

Peer reviewing

Paper