HEPiX Batch Monitoring Working Group
Fifemon used at
-
BNL: inject custom classads in hierarchy
-
CERN: working on packaging in RPM (will share); added metrics to monitor cluster utilization per experiment (and react on under-utilization w.r.t. quota)
-
Mix: fifemon + collectd based layer
-
-
FNAL: Added multiple schedds support; quota collection (from accounting classads in case one cannot contact negotiator); config options to process properly; need to push upstream
-
Classads with parameters with number in the name, causing issues (WillSK)
-
-
Deployment with Docker? Dependency hell either way.
-
Python 3 migration? Noone yet.
-
Elasticsearch / Filebeat configuration for fifemon repo as well
-
Nicolas (CC-IN2P3): mod to mon jobs submitted to local schedd (as opposed to CE) in jobs.py
-
(BNL) We also had to remove some FNAL-specific assumptions in the code
-
-
Include cgroups mon into fifemon?
-
(BNL) Separate tool for us, direct monitoring on node (https://github.com/HEPiX-batchmonitoring/condor_graphite)
-
-
Mixed solutions for monitoring with some overlap between what each tool can do
-
https://indico.cern.ch/event/778660/contributions/3245464/attachments/1770416/2876523/CERNHTCondorMonitoring.pdf
-
Common repository?