Mar 23 – 27, 2015
Physics Department, Oxford University
Europe/London timezone

Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads running at a Tier-2

Mar 25, 2015, 2:50 PM
25m
Martin Wood Lecture Theatre, Parks Road (Physics Department, Oxford University)

Martin Wood Lecture Theatre, Parks Road

Physics Department, Oxford University

Computing & Batch Services Computing and Batch Systems

Speaker

Gang Qin (University of Glasgow (GB))

Description

Modern Linux Kernels include a feature set that enables the control and monitoring of system resources, called Cgroups. Cgroups have been enabled on a production HTCondor pool sited at the Glasgow site of the UKI-SCOTGRID distributed Tier-2. A system has been put in place to collect and aggregate metrics extracted from Cgroups on all worker nodes within the Condor pool. From this aggregated data, memory and CPU usage footprints are extracted. From the extracted footprints the resource usage for each type of ATLAS workload can be obtained and studied. This system has been used to identify broken payloads, real-world memory usage, job efficiencies etc. Additionally work has begun on near-real-time tracking of running jobs with a goal to proactively identify and stop broken payloads from consuming unnecessary CPU time and resources.

Primary authors

Gang Qin (University of Glasgow (GB)) Gareth Roy (University of Glasgow)

Co-authors

Prof. David Britton (University of Glasgow (GB)) David Crooks (University of Glasgow (GB)) Dr Gordon Stewart (University of Glasgow) Dr Samuel Cadellin Skipsey

Presentation materials