Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads running at a Tier-2

Gang Qin (University of Glasgow (GB))


Modern Linux Kernels include a feature set that enables the control and monitoring of system resources, called Cgroups. Cgroups have been enabled on a production HTCondor pool sited at the Glasgow site of the UKI-SCOTGRID distributed Tier-2. A system has been put in place to collect and aggregate metrics extracted from Cgroups on all worker nodes within the Condor pool. From this aggregated data, memory and CPU usage footprints are extracted. From the extracted footprints the resource usage for each type of ATLAS workload can be obtained and studied. This system has been used to identify broken payloads, real-world memory usage, job efficiencies etc. Additionally work has begun on near-real-time tracking of running jobs with a goal to proactively identify and stop broken payloads from consuming unnecessary CPU time and resources.

Gang Qin (University of Glasgow (GB)) Gareth Roy (University of Glasgow)


Prof. David Britton (University of Glasgow (GB)) David Crooks (University of Glasgow (GB)) Dr Gordon Stewart (University of Glasgow) Dr Samuel Cadellin Skipsey

