Rucio Meeting

Europe/Zurich
Martin Barisits (CERN)
Videoconference
Rucio Development Meeting
Zoom Meeting ID
413496641
Host
Martin Barisits
Alternative hosts
Cedric Serfon, Mario Lassnig, Dimitrios Christidis
Passcode
28849311
Useful links
Join via phone
Zoom URL
    • 15:00 15:05
      News 5m
      • 1.27.0 code freeze!
        • Merging is on the way!
        • 1.27.0rc1 either tomorrow or on Monday
    • 15:05 15:15
      Community News & DevOps roundtable 10m
      • ATLAS
        • Issue with Throttler
          • Requests remain in WAITING state forever --> #4979
          • Way to manually run the throttler to unblock it
        • Heartbeats
          • With frequent Kubernetes pod-restarts, often outdated heartbeats are found
            • Ideally pod-shutdown should issue a heartbeat-removal as well, but doesn't seem to happen #4988
      • CMS
        • CTA multihop transfer
          • More space on EOS to leave space for multihop
            • --> More data (and thus Jobs) sent to EOS
            • Possibly based on freespace weights on rules
              • Freespace weight could be adapted
                • ATLAS investigating if using relative freespace weights instead of absolute could be beneficial
      • Fermilab/DUNE/ICARUS/RUBIN
        • Staging failures on tape system
        • Number of files in RUBIN
          • Large number of files, might be an issue for transfers/tapes
      • Belle II
        • mod_gridsite issue
          • Writes into /var/cache/gridsite
            • Many files written there
        • More work on metadata
      • DUNE
        • Work on policy packages for the client
        • After that going back to leightweight rucio clients
      • Multi-VO
        • Conveyor submitter/poller 
          • After discussion on slack it works much better now
          • Radu and Tim implemented a fix which should be able to select the right certficiate for the right VO
      • ESCAPE
        • DAC21 (Data and Analysis Challenge 21) next week
        • Currently two issues:
          • Hermes ​​increases memory consumption until it crashes?
            • Kubernetes metrics vs prometheus memory consumption does not add up?
          • Reaper greedy deletion
            • Found small unrelated bug which is fixed in helm-chart
            • LSST usecases --> want to reach 60k deletion / h 
      • SKAO
    • 15:15 15:20
      Component responsible update 5m
      Speaker: Martin Barisits (CERN)
    • 15:20 15:40
      Suspicious file recovery 20m
      Speakers: Christoph Ames (Ludwig Maximilians University Munich (DE)), Cedric Serfon (Brookhaven National Laboratory (US))
    • 15:40 15:45
      Container/helm-chart new rucio.cfg workflow 5m
      Speaker: Radu Carpa (CERN)
    • 15:45 15:55
      Developers roundtable 10m

      Rucio 1.27 "Batdonkey v. Superdonkey" priority followup

      • In Progress
        • Auditor overhaul #3437 [Dimitrios, Eric, Stefan] [Longer activity lasting beyond 1.27 release]
          • Let's schedule meeting in November
        • Logging review #4220 [Martin, Joel, All comp leads]
          • Would be good to get into 1.27
        • Quality of Service #3419 [Matt] [Beyond 1.27 release]
          • Demonstrator for storage-issued QoS changes for BNL MAS
          • DOMA QoS
        • Optimize database interactions #4793 [Martin, Mario, Radu] [Beyond 1.27 release]
          • Radu started to look into temp tables
            • Proof of concept works, but different :-) on Oracle
              • Global vs private temporary tables
                • Global needs to be part of schema
          • Meeting with CERN IT DB admins
            • Decrease Transaction (7000 transactions/s)
              • Recently found issue about session handing between API and Core
              • Will be addressed to largely reduce the number of empty rolled back sessions
            • Decrease LOGON rate (7-10 logons/s)
              • Optimization session pools
            • Temporary tables
        • Rename Daemons to commonly understandable names #4795 [Martin, Joel]
          • Add aliases now, remove them later (if at all)
          • If you have suggestions for proper names, please add them to the issue
            • Deadline for this next week
        • Prepare replacement of current policy import with policy packages #4798 [James, Martin] [Beyond 1.27 release]
          • Best method to get policy packages into container?
            • Probably not a good idea to pip-install them on startup
          • Fermilab + CMS builds separate containers including these
        • Enabling tests for different policy package #3878 [Mayank]
          • New GH action workflow
            • Tests are specified via build-matrix 
              • Specify modifications of rucio.cfg
              • Right now it assumes policy package is in container
      • Todo
        • Get SSO Login working [Rizart]
        • Down-scoped tokens for user interactions #4791 [] [Beyond 1.27 release]
        • Versioning for REST API #4796 [Ben, Martin] [Beyond 1.27 release]
      • Done
        • Disentangle fts3 specific code from conveyor and move to transfertool #857 [Radu]
        • helm-charts release management #4794 [Radu, Eric, Martin]
        • rucio.cfg vs config table #2630 [Mario, David]
      • Delayed

      Developer roundtable

      • Metadata
        • PR needs merging, Review is GTG
    • 15:55 16:00
      AOB 5m