Rucio Development Meeting

Europe/Zurich
Martin Barisits (CERN)
    • 1
      News
      • Full Tele-Working at CERN (Safe Mode)
      • Rucio Workshop
        • Given the COVID-19 situation and the many remote participants the workshop went very well
        • 80 participants
        • 29 presentations
      • Coding Camp
        • Was cancelled
        • Hopefully we can have another one later this year
      • Release schedule
        • 1.21.12 was released on Monday
        • 1.22 "Green Donkey"
          • Code Freeze since Monday until ~Mar-27
          • 1.22.0rc1 Today
            • More RCs as needed
          • Friday Mar-27 1.22.0 final
        • 1.22.1 patch release on Apr-06
      • Next week 1.23 "The incredible Donkey" release planning
      • New CERN technical student
        • Ben
    • 2
      News from the experiments
      • ATLAS
        • EOS CTA tests from SFO
        • New HTTPD server configuration
          • Was running to prefork MPM mode, switch to event MPM mode
            • Better performance
          • Issue: Multipl mod_wsgi containers with separate session pools
            • Not well handled in default config
            • Added wsgi_groups and linked all endpoints
            • Shared connection pool accross all REST endpoints
              • Much better performance
          • Suggestion: Make this the default!
      • CMS
        • Successfully test multihop
        • Issue
          • For multihop 2nd transfer is submitted (and fails) before the first one is finished
          • Possible FTS bug?
          • Cedric will have a look and test it on ATLAS again
      • MultiVO
        • DB schema upgrade is merged
      • Belle II
        • Started to work on chained subscription mode
        • First prototype next week
      • Globus Online
        • Reaper development for GO
        • Deletion enhancement (2 phases?)
        • Testing to start with ATLAS
        • Lessons learned then applicable for light sources too
      • LDMX
        • Rucio to be used as a data catalog/metadata catalog
        • Add more operators (more than equals) to metadata queries
    • 3
      Hot topics
      • Globus Online Testing
        • Support implemented in Rucio
        • Needs testing within ATLAS
          • Start to test transfers first, later on also do deletion
        • Currently ATLAS:
          • Harvester manages transfer to Globus endpoint
          • Would like to use Rucio to manage these transfers since US HPC facilities might move away from offering GridFTP endpoints
        • Questions/Issues/Ideas
          • Authentication
            • Each transfer has its own refresh token
              • Matt: There is a forever token tied to the auth_client - So no individual tokens needed
            • Need to handle Globus credentials (Safe way to inject/store)
            • How to handle this in the Multi-VO world?
              • Right now: Different submitters for each VO
          • Integration Testbed
            • Need to populate it with some data
            • Should be integrated to Panda testtools as well
            • 2 Endpoints we can use: SLAC and BNL (dual use)
              • No GLOBUS only endpoints yet
            • Will setup daemon on ATLAS integration cluster
          • Transfer/Scale testing
            • Once we are happy with transfers, also test reaper
          • Need to plan for 2 rules
            • WLCG <-> EDGE <-> GLOBUS
            • No multi-hop for this (Yet?)
            • EDGE node supports both WLCG world and GLOBUS world
          • Transfer limitations in GO
            • 100 transfers concurrently
            • Need to bulk them (Is there a bulk limit? 1Mio per transfer?)
            • Can submit more, GO will queue them?
          • Monitoring
            • Need to get the transfer events for GO also into the CERN monit stream
        • Further communication and followup on Slack: https://rucio.slack.com/archives/C8Z07UWKV
    • 4
      Developers roundtable
      • Issue burn update
      • Rucio 1.22 "Green Donkey" priority development followup
        • In Progress
          • Kubernetes [Thomas]
            • Got the new network controller! Single-Zone calico network controller (Nodes need to be in one availability zone though)
              • Was mapped manually by Ricardo
              • Improved the queries, but still not the same as puppet servers
            • With changes of MPM event mode it gets quite close to the puppet servers
            • Problem with the old cluster to get logs out of the server (network was too slow)
            • CERN IT team working on multi zone network controller
            • Once these issues fixed then we can move more services to 
            RSEMgr v2.0 #3147 [Tomas, Tobi]
            • Upload migration PR is open since some time, probably needs to be re-submitted for testing
          • Unification of metadata #3096 [Aris]
            • Rucio core part is mostly done
            • API/CLI work needs to be done
              • Re-use existing calls
        • ToDo
          • Documentation [Martin]
          • rucio.cfg vs config table #2630 [Mario]
          • Versioned history tables need explicit definition #2063 [Martin]
          • Handling of Archives in Reaper #1431 [Cedric, Thomas, Mario, Martin]
          • Missing features for Archives #1091 [Mario, Martin]
          • Documentation page listing config table and RSE attributes #2631 [Mario]
          • Operators doc and repo #2636 [Martin]
            • Dimitrios hopes to start an effort mid-february
              • Some ATLAS parts might be required to stay confidential
        • Done
          • MultiVO #2635 [Ian, Eli, Patrick]
            • Open PR works fine now; Will be merged to coming feature release
            • Andrew prepared a few other things which can be improved and submitted incrementally (probably)
          • Repository cleanup #5252 [Martin, Thomas]
          • Configuration / Permissions / Policy #533 [James]
          • Cleanup of Issues [Everyone]
          • Reaper 2.0 #2412 [Cedric, Thomas]
          • Protection of sources too strict in reaper #1637 [Cedric, Thomas]
          • OIDC #2612 [Jaroslav]
          • Reaper should stop in case of judge backlog #1578 [Martin]
          • Add file to content history in reaper #37 [Cedric, Thomas]
            • Done, but one bug to fix due to contents_history partitioning
      • Auditor
        • Contacted by CMS who want to continue on Auditor work
        • Dimitrios and Thomas will try to give pointers, probably should re-write some parts of the auditor into an auditor2
      • Third party copy read and write needs to be enabled [Martin, Cedric, Dimitrios]
        • Duplicate the information for both operations in the probe for now
        • Need to iterate with AGIS to get TPC_write and _read in there
        • No news
      • CRIC progress [Panos]
      • Review probes [Thomas, Dimitrios]
        • Dimitrios needs some help with reviewing the probes, since he is overloaded
        • Thomas will help with reviewing
        • No news :-)
      • RPG development [Tomas]
        • Tomas will write document about project definition for RucioRPG
        • Needs more discussion
      • Issue with Reaper2 which unnecessarily deletes secondary replicas due to replicas in wrong state
      • staging_startged_at staging_finished_at working now?
        • Need to check
      • ​​​​​​​
    • 5
      AOB