HEPiX Batch Monitoring Working Group

America/New_York

Fifemon used at

  • BNL: inject custom classads in hierarchy

  • CERN: working on packaging in RPM (will share); added metrics to monitor cluster utilization per experiment (and react on under-utilization w.r.t. quota)

    • Mix: fifemon + collectd based layer

  • FNAL: Added multiple schedds support; quota collection (from accounting classads in case one cannot contact negotiator); config options to process properly; need to push upstream

    • Classads with parameters with number in the name, causing issues (WillSK)

  • Deployment with Docker? Dependency hell either way.

  • Python 3 migration? Noone yet.

  • Elasticsearch / Filebeat configuration for fifemon repo as well

  • Nicolas (CC-IN2P3): mod to mon jobs submitted to local schedd (as opposed to CE) in jobs.py

    • (BNL) We also had to remove some FNAL-specific assumptions in the code

  • Include cgroups mon into fifemon?

    • (BNL) Separate tool for us, direct monitoring on node (https://github.com/HEPiX-batchmonitoring/condor_graphite)

  • Mixed solutions for monitoring with some overlap between what each tool can do

  • https://indico.cern.ch/event/778660/contributions/3245464/attachments/1770416/2876523/CERNHTCondorMonitoring.pdf

  • Common repository?

There are minutes attached to this event. Show them.