Rucio Meeting

Europe/Zurich
Martin Barisits (CERN)
Zoom Meeting ID
413496641
Host
Martin Barisits
Alternative hosts
Mario Lassnig, Cedric Serfon, Dimitrios Christidis
Passcode
28849311
Useful links
Join via phone
Zoom URL
    • 15:00 15:05
      News 5m
      • 1.27.0 code freeze!
        • Merging is on the way!
        • 1.27.0rc1 either tomorrow or on Monday
    • 15:05 15:15
      Community News & DevOps roundtable 10m
      • ATLAS
        • Issue with Throttler
          • Requests remain in WAITING state forever --> #4979
          • Way to manually run the throttler to unblock it
        • Heartbeats
          • With frequent Kubernetes pod-restarts, often outdated heartbeats are found
            • Ideally pod-shutdown should issue a heartbeat-removal as well, but doesn't seem to happen #4988
      • CMS
        • CTA multihop transfer
          • More space on EOS to leave space for multihop
            • --> More data (and thus Jobs) sent to EOS
            • Possibly based on freespace weights on rules
              • Freespace weight could be adapted
                • ATLAS investigating if using relative freespace weights instead of absolute could be beneficial
      • Fermilab/DUNE/ICARUS/RUBIN
        • Staging failures on tape system
        • Number of files in RUBIN
          • Large number of files, might be an issue for transfers/tapes
      • Belle II
        • mod_gridsite issue
          • Writes into /var/cache/gridsite
            • Many files written there
        • More work on metadata
      • DUNE
        • Work on policy packages for the client
        • After that going back to leightweight rucio clients
      • Multi-VO
        • Conveyor submitter/poller 
          • After discussion on slack it works much better now
          • Radu and Tim implemented a fix which should be able to select the right certficiate for the right VO
      • ESCAPE
        • DAC21 (Data and Analysis Challenge 21) next week
        • Currently two issues:
          • Hermes ​​increases memory consumption until it crashes?
            • Kubernetes metrics vs prometheus memory consumption does not add up?
          • Reaper greedy deletion
            • Found small unrelated bug which is fixed in helm-chart
            • LSST usecases --> want to reach 60k deletion / h 
      • SKAO
    • 15:15 15:20
      Component responsible update 5m
      Speaker: Martin Barisits (CERN)
    • 15:20 15:40
      Suspicious file recovery 20m
      Speakers: Christoph Ames (Ludwig Maximilians University Munich (DE)), Cedric Serfon (Brookhaven National Laboratory (US))
    • 15:40 15:45
      Container/helm-chart new rucio.cfg workflow 5m
      Speaker: Radu Carpa (CERN)
    • 15:45 15:55
      Developers roundtable 10m

      Rucio 1.27 "Batdonkey v. Superdonkey" priority followup

      • In Progress
        • Auditor overhaul #3437 [Dimitrios, Eric, Stefan] [Longer activity lasting beyond 1.27 release]
          • Let's schedule meeting in November
        • Logging review #4220 [Martin, Joel, All comp leads]
          • Would be good to get into 1.27
        • Quality of Service #3419 [Matt] [Beyond 1.27 release]
          • Demonstrator for storage-issued QoS changes for BNL MAS
          • DOMA QoS
        • Optimize database interactions #4793 [Martin, Mario, Radu] [Beyond 1.27 release]
          • Radu started to look into temp tables
            • Proof of concept works, but different :-) on Oracle
              • Global vs private temporary tables
                • Global needs to be part of schema
          • Meeting with CERN IT DB admins
            • Decrease Transaction (7000 transactions/s)
              • Recently found issue about session handing between API and Core
              • Will be addressed to largely reduce the number of empty rolled back sessions
            • Decrease LOGON rate (7-10 logons/s)
              • Optimization session pools
            • Temporary tables
        • Rename Daemons to commonly understandable names #4795 [Martin, Joel]
          • Add aliases now, remove them later (if at all)
          • If you have suggestions for proper names, please add them to the issue
            • Deadline for this next week
        • Prepare replacement of current policy import with policy packages #4798 [James, Martin] [Beyond 1.27 release]
          • Best method to get policy packages into container?
            • Probably not a good idea to pip-install them on startup
          • Fermilab + CMS builds separate containers including these
        • Enabling tests for different policy package #3878 [Mayank]
          • New GH action workflow
            • Tests are specified via build-matrix 
              • Specify modifications of rucio.cfg
              • Right now it assumes policy package is in container
      • Todo
        • Get SSO Login working [Rizart]
        • Down-scoped tokens for user interactions #4791 [] [Beyond 1.27 release]
        • Versioning for REST API #4796 [Ben, Martin] [Beyond 1.27 release]
      • Done
        • Disentangle fts3 specific code from conveyor and move to transfertool #857 [Radu]
        • helm-charts release management #4794 [Radu, Eric, Martin]
        • rucio.cfg vs config table #2630 [Mario, David]
      • Delayed

      Developer roundtable

      • Metadata
        • PR needs merging, Review is GTG
    • 15:55 16:00
      AOB 5m