Batch Operations Weekly

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map
Description
Videoconference Rooms
Batch_Operations_Weekly
Name
Batch_Operations_Weekly
Description
Batchops
Extension
10748105
Owner
Gavin McCance
Auto-join URL
Useful links
Phone numbers
    • 14:00 15:00
      Agenda 1h
      • Keep until 3rd September:
        • OTG0051379: Short interruption of some services in Vault
          • DONE BBC-2417: drain batch projects affected.
      • BBC-2183: Recreate CEs in Wigner.
      • BBC-1991 - Puppet codebase quality
        • BBC-2423: in progress htcondor module types
      • All LSF queues stopped. Infra to be removed next Monday
      • Main collectors very busy
        • More sub collectors deployed
        • Clients moving to UDP updates (QA done with no issues)
        • Collector VIEW_HOST policy changed to send Scheduler class ads: enables moving schedds to subcollector
      • Thoughts about HANDLE_QUERY_IN_PROC_POLICY:
        • HANDLE_QUERY_IN_PROC_POLICY Docs.
        • It defines when the collector should fork. At the moment the collector DutyCycle is ~0.98, is this a symptom of heavy InProc queries keeping it busy? Should we fork more aggresively?
      • EOS-3665: nodes spotted with caches filling the filesystem.
      • Misc:
        • AFS in bad shape in Point8 after neutron issues a few weeks ago. Nodes not running jobs restarted, others marked for drain & reboot.
        • New project in Point8 (005): provisioning in progress.
        • More resources in Point8 (004).
        • More resources in Geneva (044).
      • Negotiate shared-library:
      • Haggis backend:
        • Added audit layer for every user interaction
        • Need to think about alarming for dangerous cases (quota changes, deletions, etc.):
          • Grafana alert?
          • Use EsAlerts?
          • Mattermost bot?
        • Suggestions here!
      • Haggis database:
        • In the context of auditing, I suggest the addition of two new columns to all tables so that we can more easily future strange misbehaviours:
          • created: SQL timestamp default value
          • last modified: Hard without using triggers. Would need to push the timestamp from layer 7, i.e. tatties & haggis.
            • We could move the repositories code into a shared-lib (the two applications code is duplicated in this regard) and make the necessary change a 1 line code addition in all the entities?
      • HPC integration with Haggis:
        • Current ldap query retrives HPC-related compute groups but the Haggis backend does not process them BBC-2433. Currently investigating.
      • Can  schedd stop a submit req.