Rucio Development Meeting

Europe/Zurich
Martin Barisits (CERN)
Description

Video Conferencing: Please join Zoom instead of Vidyo!

Zoomhttps://cern.zoom.us/j/413496641

Meeting ID: 413 496 641
Find your local number: https://cern.zoom.us/u/aT2QQfXAo

    • 15:00 15:10
      News 10m
    • 15:10 15:20
      News from the experiments 10m
      • ATLAS
        • Found bug in rucio upload
          • Doing local upload in lan domain, rse name is not correctly forwarded
          • Probably related to https://github.com/rucio/rucio/issues/3312
          • Dimitrios will have a look
      • CMS
        • Argument for reaper2 to only work on certain did patterns
          • PR for everyone - yes
        • Authentication issues in the WebUI
          • The hardcoded /ui/ should be removed
          • Should be fixed with 1.22.3
          • Switching users on the UI homepage gets some JavaScript Error
            • Might be an issue with the proxy setup on the webui httpd
            • Thomas will look into it
      • ESCAPE
        • Changing schema to allow slashes in dids as well
        • Upgrading to 1.22.2
          • OIDC authentication useful for ESCAPE
      • Belle II
        • Rucio file catalog in BelleDIRAC
        • Develop monitoring tool for BelleII
      • MultiVO
        • Upgrade of RAL instance to 1.21.*
      • DUNE/Edinburgh
        • Looking into Auditor operations with object store
      • DIRAC-Integration
        • Test version of Rucio server installed
        • Implement API functions in DIRAC
        • Work will go into Vanilla DIRAC main branch
    • 15:20 15:30
      Hot topics 10m
    • 15:30 15:55
      Developers roundtable 25m
      • Burn chart and progress
      • 1.23.0 LTS "The Incredible Donkey" priority followup
        • In Progress
          • Documentation overhaul [Martin, Dimitrios]
            • Page Listing config table and RSE Attribute Parameters #2631 [Martin]
            • Operators Documentation and recipe repository #2636 [Martin]
            • Early phase of picking tools/deciding structure/content
              • Separation between generic / VO specific content
            • Possible discussion in 2 weeks for everyone to comment
          • Expand Kubernetes Usage [Thomas]
            • Waiting for Ricardo for node investigation
            • Reaper2 constantly increasing memory usage (until limit is hit) and restarts
              • Confirmed by CMS too
                • ~50 RSEs processed in reaper
              • ATLAS made big jump to 300+ RSEs
              • Being investigated
              • Check memory usage
              • ATLAS sees this on reaper1 now as well
                • Related to gfal?
                • Needs followup
            • Debug features with attachable containers coming soon
            • MultiZone cluster available now
            • Increasing cluster size next week
          • AAI/OIDC Testing and Improvements [Jaroslav]
            • Test of propagation of account to transfertool
            • New patch release to deploy the recent developments on WLCG DOMA cluster
          • MultiVO Functionality #2635 [Eli]
            • Bringing work up to date
            • Meeting later on to specify next steps
            • Discussion: Administration of different VOs
              • Securing VOs, Accounts etc.
            • List of code-parts which needs specific changes to enable Multi-VO
          • Unification of metadata interfaces #3096 [Aris]
            • PR submitted, waiting for comments
        • To do
          • rucio.cfg vs config table #2630 [Mario]
          • Handling of Archives in the Reaper #1431 [Thomas, Cedric]
          • Log the Parameters used in all POST/PUT requests #2686 [Thomas]
          • New Code management Model #3417 [Martin, Ben]
            • Tested github actions to automate testing of cherry-picks against release branches
            • Tests for PR would still run in travis, cherry-picks would be tested in github actions
            • Might make sense to move everything to github actions
          • RSEmgr version 2.0 #3147 [Tomas, Tobi]
          • QoS #3419 [Aris, Mario, Martin]
            • Some open conceptual decisions
            • Dedicated meeting for this
          • Python 3 #3420 [Martin]
        • Done
      • Traces [Thomas]
        • Trace infrastructure for CMS
        • Actually not easy to do, since there is no documentation and schema
        • Only Kronos daemon expects certain fields in the traces
        • Setup (and enforce) a base schema on the server
          • Decline and/or monitor the traces failing schema validation
        • Kronos daemon has lots of ATLAS specifics
          • Kronos2.0 makes experiment specific pluginable
      • Monitoring [Cedric, Thomas]
        • For ATLAS monitoring aggregations are done in the monitoring infrastructure
        • A light version of this would be useful for other communities too
        • Tool/Daemon which does this aggregation
      • Auditor #3437 [Dimitrios]
        • Went through code 
        • Started to work on core function
        • Test cases are missing
        • Side-effects of only taking a dump with AVAILABLE replicas?
        • Object stores
          • Possible to get file lists from object stores (list buckets)
          • Still two lists to compare
          • Possible extra intelligence needed to handle corner cases
      • 2020-04-02
        • Handling of lost files in archives in the necromancer [Cedric, Tomas]
          • Tomas can look into it
          • Will require additional queries to check for archives
        • Auditor discussion [Dimitrios, Tomas]
          • Input 2 files: DB Dump, Storage Dump
          • Can Auditor not directly get DB information from Rucio (instead of relying on DB Dump)?
            • Possible to do both ways?
              • Difficult, since not all information is available in the db for past replica states
          • Auditor compares the 2 states (DB, Storage)
            • Auditor might as well work on DB dump (without generated PFNs) and generate the PFNs during processing
          • pre, common, post actions
            • Directories for DB, Storage dump being filled (externally)
            • Auditor runs and fetches data from the directories
            • Auditor produces output
          • Dimitrios will create a ticket to collect ideas/workflows and we move forward from there
            • Collect usecases there, verify that it works (compared to old auditor)
    • 15:55 16:00
      AOB 5m
      • Group photo from workshop
        • https://docs.google.com/drawings/d/1TdeT6Aq4U2tesW87NhoDEEOsQ4HFcRhYZu-OR7SCbdU/edit?usp=sharing
        • Please send selfies to Mario!