Sep 21 – 25, 2020
(teleconference only)
Europe/Paris timezone

HTCondor monitoring at ScotGrid Glasgow

Sep 24, 2020, 5:40 PM
20m
https://cern.zoom.us/j/97987309455

https://cern.zoom.us/j/97987309455

HTCondor user presentations Workshop session

Speaker

Emanuele Simili (University of Glasgow)

Description

Our Tier2 cluster (ScotGrid, Glasgow) uses HTCondor as batch system, combined with ARC-CE as front-end for job submission and ARGUS for authentication and user mapping.
On top of this, we have built a central monitoring system based on Prometheus that collects, aggregates and displays metrics on custom Grafana dashboards. In particular, we extract jobs info by regularly parsing the output of 'condor_status' on the condor_manager, scheduler, and worker nodes.
A collection of graphs gives a quick overlook of cluster performance and helps identify rising issues. Logs from all nodes and services are also collected to a central Loki server and retained over time.

Desired slot length 15
Speaker release Yes

Primary author

Emanuele Simili (University of Glasgow)

Co-authors

Presentation materials