Speaker
Gang Qin
(University of Glasgow (GB))
Description
Modern Linux Kernels include a feature set that enables the
control and monitoring of system resources, called Cgroups. Cgroups
have been enabled on a production HTCondor pool sited at the Glasgow
site of the UKI-SCOTGRID distributed Tier-2. A system has been put in
place to collect and aggregate metrics extracted from Cgroups on all
worker nodes within the Condor pool. From this aggregated data, memory
and CPU usage footprints are extracted.
From the extracted footprints the resource usage for each type of
ATLAS workload can be obtained and studied. This system has been used
to identify broken payloads, real-world memory usage, job efficiencies
etc. Additionally work has begun on near-real-time tracking of running
jobs with a goal to proactively identify and stop broken payloads from
consuming unnecessary CPU time and resources.
Primary authors
Gang Qin
(University of Glasgow (GB))
Gareth Roy
(University of Glasgow)
Co-authors
Prof.
David Britton
(University of Glasgow (GB))
David Crooks
(University of Glasgow (GB))
Dr
Gordon Stewart
(University of Glasgow)
Dr
Samuel Cadellin Skipsey