Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

Rucio Meeting

Europe/Zurich
Martin Barisits (CERN)
Zoom Meeting ID
413496641
Host
Martin Barisits
Alternative hosts
Mario Lassnig, Cedric Serfon, Dimitrios Christidis
Passcode
28849311
Useful links
Join via phone
Zoom URL
    • 15:00 15:05
      News 5m
      • March meeting schedule
        • Mar-07
        • Mar-14
        • Mar-21
        • Mar-28
      • 34 "Donkey Potter and the Data Cache" release schedule
        • Code Freeze March 08 12:00
        • RC1: March 12
        • Final: March 19
      • New team member: Hugo
    • 15:05 15:35
      DC24: Rucio Retrospective 30m

      Discussion topics:

      • Improve Submitter (an co.) performance (#6505)
        • Also affects conveyor- preparer and stager
        • Does not make sense to submit transfers for rules which are already expired
        • Hotfix was made (for ATLAS), will be pushed upstream soon
        • CMS:
          • Submitting (datasets) blocks with big files-> was ok
          • Submitting (datasets) blocks with small files -> saw this problem as well
      • Improve Cleaner performance (#6511)
        • Cleaner having hard time to clean ongoing rules
        • Needs a small re-design of the cleaning workflow
      • Prevent overlap between Reaper threads (#6512)
        • Bulk-size configurable, however, last_updated time of replicas is not updated
        • Update remaining replicas during the loop
        • CMS:
          • Lots of deletion not found errors due to this
          • Manual solution: Partition reapers
      • DUNE
        • Channels clogged up so much due to misconfig/misunderstanding of configuration
          • Ability to warn that one link is clogging the system
            • Possibly do-able via the new /metrics endpoint
        • Scaling of frontend httpd servers
          • Worked well in handling the load
          • pod config similar to ATLAS ones, running 12 pods with default mod_wsgi config
          • Other daemons increased a bit as well (similar to ATLAS config)
        • Timeout when calling metacat server via policy
          • Not clear that it comes from metacat
        • Standard concsistency tools (dark data, etc.) would be helpful
      • Scaling of daemons/services
        • Submission of lots of small files
          • Scaled up 3 pods to 8 pods worked for CMS
          • Kubernetes Auto-Scaling could solve this, but it is not trivial
          • No clear recommendation how many pods/threads are needed for what load -> experimental mostly
      • Rucio tries to delete replicas that were never transferred
        • Clearly useful to avoid the creation of dark data, but affects deletion performance in artificial situations like this
        • Replica never transferred -> replica goes to UNAVAILABLE -> goes to reaper
      • Tokens
        • Polling for about to expire tokens was biggest problem, something to adress in the next version of FTS
        • Katy: Repeat a token specific mini challenge later in time
          • Yes, new release of FTS should come in April
        • Mini challenges will need to be done also for Rucio upload/download workflow
        • #of refresh tokens stored in IAM issue needs to be adressed
        • Alternative IdPs (CiLogon) needs to be tested as well
          • VO Mapping in CiLogon?
        • IAM:
          • Could there be a procedure to wipe DB (if clogged)
        • Coarseness of token-scopes still hot topic
        • CMS
          • Rucio worked really well and the injector tool was excellent!
    • 15:35 15:45
      Community News & DevOps roundtable 10m
    • 15:45 15:55
      Developers roundtable 10m
      • Rucio 34 "Donkey Potter and the Data Cache" roadmap
        • In Progress
          • foreign key error on deleting dids in reaper #5733 [Alex]
          • factorize duplicate messaging code into a common module or class #6423 [Alex]
          • Deployment and Release Workflow #401 [Mayank, Eraldo]
          • Missing WebUI Release 33 page tracker #301 [Mayank, Eraldo]
          • Migrate Dashboard to Clean Architecture #158 [Mayank, Eraldo]
          • Unable to Delete File DID via Undertaker #5154 [Riccardo]
          • Type annotate the code #6454 [Riccardo]
          • Update extension for v32 (and higher) compatibility #25 [Francesc, Enrique]
        • In Review
          • Continue migration to SQLAlchemy 2.0 syntax #6057 [Erling]
            • 5 PRs due to size (Two are submitted upstream, but other 3 can be pushed)
          • Refactor policy package algorithm code #6382 [James]
          • Metadata for tape co-location and transfer prority #6398 [Maggie]
          • Update/Re-design core.meta module #5224 [Maggie, Rob]
        • Todo
          • bridge the gap between running rucio in demo env and full production deployment #187 [Radu, Enrique]
            • Needs somebody else on this, now that Radu has left
        • Done
          • Add Token based TPC tests to the CI #6451 [Radu]
        • Delayed
      • Documentation corner
        • Documentation and dev guidelines for Mypy type annotations #116 [Mayank, Martin]
        • Document environmental variables affecting the client #171 [Dimitrios]
        • Improve documentation on rucio.cfg vs configuration table #183 [Radu]
        • Add an FAQ-style entry aimed at users for STUCK rules #184 [Fabio]
        • Add instruction about DB partitioning #185 [Martin]
        • bridge the gap between running rucio in demo env and full production deployment #187 [Radu]
        • Introduce documentation on subscriptions #190 [Cedric]
        • WebUI: Improve Docs #255 [Eraldo]
        • Add instructions for Mac Apple Silicon in the developer section #261 [Eraldo]
          • Under Review - Comments posted, needs iteration
        • Add Rucio QoS RSE description and instructions #268 [Matt]
          • Under Review - Comments posted, needs iteration
        • Document how to set up command line argument completion #275 [Bouwe]
        • Formatting / style guide #287 [???]
        • Document how deletion occurs. #288 [???]
      • Other topics
        •  
    • 15:55 16:00
      AOB 5m