Rucio Development Meeting

Europe/Zurich
Martin Barisits (CERN)
Description

Zoomhttps://cern.zoom.us/j/413496641

Meeting ID: 413 496 641
Find your local number: https://cern.zoom.us/u/aT2QQfXAo

    • 15:00 15:10
      News 10m
      • Release schedule
        • 1.22.3.post1
          • Hotfix for xCache URL mangling
      • Website
        • Please submit PR to e.g. update publication list
    • 15:10 15:20
      News from the experiments 10m
      • ATLAS
        • Issue with CVMFS deployment (Do not overwrite directory due to caching effects)
        • Tape Carousel post-mortem in the next weeks
          • Result might be interesting for other communities
      • CMS
        • Bringing Rucio into production for a specific set of data
        • Bug in Conveyor-Submitter
          • Not clear if it is in the current version
          • Will be confirmed if an issue in 1.22
        • Actual prod DB will be created this week
      • Belle II
        • Monitoring 
          • Frontend (Grafana dashboard)
          • Matrix does not work (Yet)
        • Work on message aggregator
          • For now separate from Hermes, but should probably be merged
      • MultiVO/RAL
        • Upgraded RAL instance to 1.21
          • Had some issues, but works now
          • Stopped sending tracing information (port closed - rejected)
          • Now works
      • DUNE
      • LVMX
        • Ready to start first large-scale production
        • (Use it only as a catalog)
        • Does not work to list dataset-replicas (Needs a rule at least once)
        • Can use list-replicas --deep instead
        • Pilot project
    • 15:20 15:30
      Hot topics 10m
    • 15:30 15:55
      Developers roundtable 25m
      • Presentation about Code Management Model from ben
        • Q: How to collaborate on a single development?
          • Pull/Merge from personal branches. Then PR to rucio repository
      • Burn chart and progress
      • 1.23.0 LTS "The Incredible Donkey" priority followup
        • In Progress
          • Documentation overhaul [Martin, Dimitrios]
            • Early phase of picking tools/deciding structure/content
              • Separation between generic / VO specific content
            • Not a lot of progress - need to get a bit more hands on and try things and iterate
          • Expand Kubernetes Usage [Thomas]
            • Waiting for Ricardo for node investigation
            • Reaper2 constantly increasing memory usage (until limit is hit) and restarts
              • Confirmed by CMS too
                • ~50 RSEs processed in reaper
              • ATLAS made big jump to 300+ RSEs
              • Being investigated
              • Check memory usage
              • ATLAS sees this on reaper1 now as well
                • Related to gfal?
                • Needs followup
            • Debug features with attachable containers coming soon
            • MultiZone cluster available now
            • Increasing cluster size next week
            • Added more configuration parameters to chart (google secrets)
            • Account switcher @ webui
              • Should be easy to fix
              • Eric will try - patch will follow
            • Activating more daemons on k8s
          • AAI/OIDC Testing and Improvements [Jaroslav]
            • Test of propagation of account to transfertool
            • New patch release to deploy the recent developments on WLCG DOMA cluster
            • Testing transfers with FTS & dCache and OIDC auth
              • Auth flow with rucio-admin token does not work at the moment
              • Second mode: user token
                • Needs a fix
            • WebUI fix
            • Should plan continous test efforts with multiple storages
          • MultiVO Functionality #2635 [Eli, Patrick]
            • Bringing work up to date
            • Meeting later on to specify next steps
            • Discussion: Administration of different VOs
              • Securing VOs, Accounts etc.
            • List of code-parts which needs specific changes to enable Multi-VO
            • Issue with migration script under py3.6 and oracle
              • Py3 server container would be very useful to test this
                • (Thomas will prepare)
            • Policy packages adaption for MultiVO
          • Unification of metadata interfaces #3096 [Aris]
            • PR submitted, waiting for comments
          • New Code management Model #3417 [Martin, Ben]
            • Bens presentation
          • Python 3 #3420 [Martin]
            • rucio setup.py fixed
            • Starting to test py3 server again with travis
          • QoS #3419 [Aris, Mario, Martin]
            • Some open conceptual decisions
            • Dedicated meeting for this
        • To do
          • Operators Documentation and recipe repository #2636 [Martin]
          • Page Listing config table and RSE Attribute Parameters #2631 [Martin]
          • rucio.cfg vs config table #2630 [Mario]
          • Handling of Archives in the Reaper #1431 [Thomas, Cedric]
          • Log the Parameters used in all POST/PUT requests #2686 [Thomas]
          • RSEmgr version 2.0 #3147 [Tomas, Tobi]
        • Done
      • Auditor
        • Decide on interface, development is mostly in the "policies"
      • 2020-04-16
        • Gitlab vs Github
          • Worth to move (back) to GitLab
            • At the moment no strong benefit, but might change in the future?
        • Auditor #3437 [Dimitrios]
          • Comparison with old auditor
          • Would be useful if CMS colleagues can test/compare the functionality as well
          • Unit tests missing, but should come soon
      • 2020-04-09
        • Auditor #3437 [Dimitrios]
          • Went through code 
          • Started to work on core function
          • Test cases are missing
          • Side-effects of only taking a dump with AVAILABLE replicas?
          • Object stores
            • Possible to get file lists from object stores (list buckets)
            • Still two lists to compare
            • Possible extra intelligence needed to handle corner cases
        • Monitoring [Cedric, Thomas]
          • For ATLAS monitoring aggregations are done in the monitoring infrastructure
          • A light version of this would be useful for other communities too
          • Tool/Daemon which does this aggregation
        • Traces [Thomas]
          • Trace infrastructure for CMS
          • Actually not easy to do, since there is no documentation and schema
          • Only Kronos daemon expects certain fields in the traces
          • Setup (and enforce) a base schema on the server
            • Decline and/or monitor the traces failing schema validation
          • Kronos daemon has lots of ATLAS specifics
            • Kronos2.0 makes experiment specific pluginable
      • 2020-04-02
        • Handling of lost files in archives in the necromancer [Cedric, Tomas]
          • Tomas can look into it
          • Will require additional queries to check for archives
        • Auditor discussion [Dimitrios, Tomas]
          • Input 2 files: DB Dump, Storage Dump
          • Can Auditor not directly get DB information from Rucio (instead of relying on DB Dump)?
            • Possible to do both ways?
              • Difficult, since not all information is available in the db for past replica states
          • Auditor compares the 2 states (DB, Storage)
            • Auditor might as well work on DB dump (without generated PFNs) and generate the PFNs during processing
          • pre, common, post actions
            • Directories for DB, Storage dump being filled (externally)
            • Auditor runs and fetches data from the directories
            • Auditor produces output
          • Dimitrios will create a ticket to collect ideas/workflows and we move forward from there
            • Collect usecases there, verify that it works (compared to old auditor)
    • 15:55 16:00
      AOB 5m