Indico celebrates its 20th anniversary! Check our blog post for more information!

Rucio Development Meeting

Europe/Zurich
Martin Barisits (CERN)
Description

Zoomhttps://cern.zoom.us/j/413496641

Meeting ID: 413 496 641
Find your local number: https://cern.zoom.us/u/aT2QQfXAo

    • 15:00 15:10
      News 10m
      • Google Summer of Code
        • Jupyterlabs integration
        • Rucio native desktop app
        • RucioBot NLP
      • Folding@Home
        • Demonstrator (at CERN) for them to try
        • If they are happy, more permanent deployment at UChicago possible
      • Next 2 weeks no meeting
    • 15:10 15:20
      News from the experiments 10m
      • ATLAS
        • gfal/gridftp issue
          • Completely crashing on sites where we used gsiftp deletion
          • Parallel discussion on usage of globus toolkit
          • Not enough info for the moment
      • CMS
        • Open PRs for both core and containers
          • Container merged
          • Core PR will be closed
      • Belle II
        • Deployed first prototype for monitoring
        • PR will come for hermes for the feature for everyone to use
        • Progress talk at the GDB yesterday
      • RAL 
      • DUNE/Edinburgh
        • MultiVO policy package
      • LDMX
        • Another simulation production run
          • No problems
    • 15:20 15:30
      Hot topics 10m
    • 15:30 15:55
      Developers roundtable 25m
      • Burn chart and progress
      • 1.23.0 LTS "The Incredible Donkey" priority followup
        • In Progress
          • Documentation overhaul [Martin, Dimitrios]
            • Some issues with auto-build of API documentation on MKDOCS
            • All our documentation is in Sphinx annotation, trying to integrate this now
          • Expand Kubernetes Usage [Thomas]
            • New ATLAS cluster based on flux
            • Python 3 containers
              • Server build worked, daemons did not
              • gfal2 does not work (Might need some special config)
            • New cluster for F@H
            • Lots of evicted pods on DOMA instance
              • Bug with logs and cleanup
              • Can be fixed by openstack team
          • AAI/OIDC Testing and Improvements [Jaroslav]
            • First full-stack Rucio-FTS-dCache transfers demonstrated
          • MultiVO Functionality #2635 [Eli, Patrick]
            • Get PR in the next days
            • Progressing very well
          • Unification of metadata interfaces #3096 [Aris]
            • PR submitted, incorporating comments now
          • New Code management Model #3417 [Martin, Ben]
            • Port for CI testing from travis to GH-actions
            • Custom config for the matrix
          • Python 3 #3420 [Martin]
            • rucio setup.py fixed
            • Starting to test py3 server again with travis
          • QoS #3419 [Aris, Mario, Martin]
            • Talk to FTS devs for their QoS development plan
            • Need to see if we issue QoS transitions via FTS or directly to storage
          • Changing gfal protocol (adding protocol) [Mario]
            • Instead of using gfal API, use GFAL CLI (Which can be cancelled)
            • Already works, but needs a lot of testing
            • Renaming protocols (But leave symlinks)
        • To do
          • Operators Documentation and recipe repository #2636 [Martin]
          • Page Listing config table and RSE Attribute Parameters #2631 [Martin]
          • rucio.cfg vs config table #2630 [Mario]
          • Handling of Archives in the Reaper #1431 [Thomas, Cedric]
          • Log the Parameters used in all POST/PUT requests #2686 [Thomas]
          • RSEmgr version 2.0 #3147 [Tomas, Tobi]
        • Done
      • 2020-04-30
        • Reaper 
          • Current reaper relies on data populated by probes (difficult)
          • Can we default this to rucio internal values
            • Should work for used
            • Total/Threshold more difficult
              • (Rucio cannot guess)
          • Thresholds?
          • Ticket -->
        • Client
          • 1.22.3
          • Doesnt find generic package for some reason
        • Auditor
          • How to best help Dimitrios
          • Igor prototyped a function which is much faster, Dimitrios is testing
          • Second prototype from Igor should use less memory 
          • Where to store the experiment specific scripts/tools
            • Experiment repos, if possible public so we can link them
            • Some overlap with policy packages
          • Some testing with actual data would be helpful (feedback)
            • Interpretation of the actions taken by the tool
          • Followup offline meeting
      • 2020-04-23
        • Presentation about Code Management Model from ben
          • Q: How to collaborate on a single development?
            • Pull/Merge from personal branches. Then PR to rucio repository
        • Auditor
          • Decide on interface, development is mostly in the "policies"
      • 2020-04-16
        • Gitlab vs Github
          • Worth to move (back) to GitLab
            • At the moment no strong benefit, but might change in the future?
        • Auditor #3437 [Dimitrios]
          • Comparison with old auditor
          • Would be useful if CMS colleagues can test/compare the functionality as well
          • Unit tests missing, but should come soon
      • 2020-04-09
        • Auditor #3437 [Dimitrios]
          • Went through code 
          • Started to work on core function
          • Test cases are missing
          • Side-effects of only taking a dump with AVAILABLE replicas?
          • Object stores
            • Possible to get file lists from object stores (list buckets)
            • Still two lists to compare
            • Possible extra intelligence needed to handle corner cases
        • Monitoring [Cedric, Thomas]
          • For ATLAS monitoring aggregations are done in the monitoring infrastructure
          • A light version of this would be useful for other communities too
          • Tool/Daemon which does this aggregation
        • Traces [Thomas]
          • Trace infrastructure for CMS
          • Actually not easy to do, since there is no documentation and schema
          • Only Kronos daemon expects certain fields in the traces
          • Setup (and enforce) a base schema on the server
            • Decline and/or monitor the traces failing schema validation
          • Kronos daemon has lots of ATLAS specifics
            • Kronos2.0 makes experiment specific pluginable
      • 2020-04-02
        • Handling of lost files in archives in the necromancer [Cedric, Tomas]
          • Tomas can look into it
          • Will require additional queries to check for archives
        • Auditor discussion [Dimitrios, Tomas]
          • Input 2 files: DB Dump, Storage Dump
          • Can Auditor not directly get DB information from Rucio (instead of relying on DB Dump)?
            • Possible to do both ways?
              • Difficult, since not all information is available in the db for past replica states
          • Auditor compares the 2 states (DB, Storage)
            • Auditor might as well work on DB dump (without generated PFNs) and generate the PFNs during processing
          • pre, common, post actions
            • Directories for DB, Storage dump being filled (externally)
            • Auditor runs and fetches data from the directories
            • Auditor produces output
          • Dimitrios will create a ticket to collect ideas/workflows and we move forward from there
            • Collect usecases there, verify that it works (compared to old auditor)
    • 15:55 16:00
      AOB 5m