Rucio Meeting

Europe/Zurich
Martin Barisits (CERN)
Zoom Meeting ID
69426538800
Host
Martin Barisits
Passcode
91434731
Useful links
Join via phone
Zoom URL
    • 15:00 15:05
      News 5m
      • 39.1.0 moved to next week
    • 15:05 15:25
      Community News & DevOps roundtable 20m
      • ATLAS
        • Issues with list_dataset_replicas - twice a wrong issue was returned (not using --deep)
          • Did the re-sync script not run?
          • Re-evaluate if --deep can be used again? (Possibly introduce some caching)
      • CMS
        • list_datasets_rse does not return
          • 4PB RSE
          • Relies on collection_replicas 
          • We need a permanent solution for collection_replicas (normal) vs --deep also in respect to list_dataset_rse
      • Fermilab DUNE / RUBIN / ...
        • DUNE migrating to loadbalancers to see if this is the cause of problems in the job submissions
        • Concern about client-certificate removal
          • Not sure what we can do on Rucio side (other than push for tokens)
          • RUBIN might have a Plan B, DUNE still investigating
      • ESCAPE / EOSC
        • Rucio extension now in production in SWAN @ CERN
        • Questions about istape attribute
      • INFN Datalake
        • Monitoring templates fix
        • Updating development instance
      • RI-SCALE
      • OSCARS MADDEN / ETAP
        • Metadata storage
      • DaFab
      • IHEP
    • 15:25 15:55
      Developers roundtable 30m
      • Conventional commits to be enabled this Monday
        • Old PRs grandfathered in, new PRs must follow CC
        • See documentation
      • PR Crisis: We have way too many open PRs, and the situation is not getting better
        • This is a combination due to low quality PRs, PRs that are difficult to review, bottlenecks in reviewing, and bottlenecks in merging
        • We will address this by one immediate action (PR Sprint, see later) and several changes in our development processes
        • PR Sprint
          • Starting this Monday we will start a PR Sprint to rigorously reduce the number of PRs
            • Focus on review and merging
            • Close stale things (they can be reopened)
            • EVERYONE: Please re-act quickly to comments
        • Changes in processes
          • Create a new PR template which outlines the most important points for a developer
            • Testing, coverage, contribution guide, author should pick a reviewer, etc.
            • This should make the job of a reviewer easier
          • Create a new reviewers guide
            • To guide the reviewer in reviewing a PR 
            • This will have a review template where the reviewer needs to answer some questions
              • This should help the merger to understand the level of confidence the reviewer has in the PR
          • We will (slowly) expand the merger team
          • We will introduce automation (RucioBot) of PRs
            • Stale PRs / Failing tests -> Bot will auto-close if not addressed quickly
            • PR without issue -> Close
            • Draft PRs after no activity -> Close
            • The Bot should, as much as possible, enforce the contribution guide automatically
          • For LARGE and XLARGE issues, a reviewer needs to be picked on creation of the issue
            • In principle, already part of our guideline, I will enforce this strictly on roadmap issues now
            • Discuss with the reviewer (which is more like a co-author) early on about the planned changes and the processes
              • This should avoid situations where a PR goes into wrong directions
          • Review needs to be part of the development process. Every developer should spend some time reviewing PRs!
      • Reflection about ongoing Sprint 1 (Sprint Board) [Karan]
        • Less activity, but overall good
      • Rucio 40 roadmap
        • Please size estimate all the priority issues!
        • Pick a reviewer for LARGE and XLARGE issues!
      • Rucio 40 priority followup
        • Todo
          • Create a new API endpoint for the clients to request tokens #6638[Dimitrios]
          • Implement new token authentication for download #7029[Dimitrios]
          • More useful client docstrings and CLI help messages (Target: 15/25) #363 [Dimitris]
          • Consider migrating from Jobber to Cloudprober #152 [Eric]
          • DIDs not being shown correctly in the extension after a successful download #95 [Giovanni]
          • Make available + Add to Notebook features failing #83 [Giovanni]
          • Token refresh after expiration #73 [Giovanni]
          • Prevent multiple Reaper threads from working on the same replicas #6512 [Hugo]
          • Test stability on LTS branches #7964 child of #7667 [Karan]
          • Generate "Configuration parameters" documentation page automatically, to avoid mismatches between documentation and code #325 [Maggie]
          • Change mixed prometheus_client and probe_metric approachs to use PrometheusPusher #129 [Maggie]
          • Move implimentation of CLI to new CLI structure #8295[Maggie]
          • Possible Belle II specific code in DIRAC functionality #7824[Max]
          • Remove hard-coded one-day lifetime in DIRAC API #8172 [Max]
          • Simplify what we run on CI (test suites, OS, Python versions, DB, …) #7965 child of #7667 [Mayank, Karan]
          • [ EPIC ] : Mutation Operations and Role Based Access to pages/features (Target 5) #622 [Mayank]
        • In Progress
          • Consider using Python venvs in containers to avoid conflicts with system-installed packages #458 [Ben]
          • Do not use regex to split did, use scope extraction method #7519[James]
          • [EPIC] UX improvements (Target 22) #621 [Mayank]
        • In Review
          • Add startup self-check mechanism to block Rucio services when critical diagnostics fail #8197 child of #8011 [Dimitris]
          • Ensure PostgreSQL ENUM types created during Alembic migrations honor the configured schema #8145 child of #7737[Dimitris]
          • Wrong documentation for dids/scope/name/files call #8053[Dimitris]
          • Testing: Make tools/run_tests.sh idempotent #7737 child of #7667 [Dimitris]
          • No way to check or change ownership of a scope using Rucio client #7830 [Maggie]
        • Done
          • Listing containers with more than 10k entries breaks the kernel #102 [Giovanni]
          • Rucio silently skips importing configured policy package in case of errors #7962 [Max, James]
        • Delayed
          • Stop auto-forwarding of old-style CLI commands #8294 [Maggie]
      • Other discussion
        • Rucio clients realistic lower dependency bounds (for PyPi)
    • 15:55 16:00
      AOB 5m