Rucio Development Meeting

3196/R-023 (CERN)



Show room on map
Martin Barisits (CERN)
    • 3:00 PM 3:10 PM
      News 10m
      • Meeting format
        • Protected material
        • Pre-Filling of minutes and topics to discuss
      • Meeting room
        • ATLAS P1 for 2018
        • 2019 meeting room closer to main building
      • New Rucio project logo
        • Will be integrated in the website
        • Please use this one for presentations etc.
      • 2nd Rucio community workshop
        • Call for Hosting until Oct 19
      • Interest for a Rucio Coding camp in December 2018?
        • 2 days at CERN
        • Pre-select some specific developments and use two focused days to work on these
    • 3:10 PM 3:30 PM
      News from the experiments 20m
      • CMS developers at DUNE workshop at the moment
        • Currently working on Kubernetes setup
      • DUNE workshop in Edinburgh - Cedric attended
        • Possible manpower to do some development in Rucio, possible objectstores;
        • Data Management session 
          • Cedric showed slides
          • Tape management different - More similar to STAGING area
      • Ian at RAL migrating to PostgreSQL
        • Migration recipe - Should put it on ReadTheDocs
      • Oracle schema from ATLAS DBA
        • Iterating through it to get it in sync with github/schema.sql
      • News on CMS intance
        • Everything setup on Kubernetes cluster except authentication
        • No certificate to move data around
      • LIGO / IceCube installation
        • Making progress: Moving files between RSEs
    • 3:30 PM 3:50 PM
      Hot topics 20m
      • Hanging Judge evaluator
        • Recently there was an issue of a hanging judge evaluator which lead to some unwanted replica deletion
        • As the application of replication rules (replica locks) to files is done asynchronously for performance reasons, a specific workflow lead to the deletion of replicas which should not have been deleted
          • 1: File1 with Replica1gets added to Dataset(which has a rule for RSEA)
          • 2: Datasetgets deleted --> Rule gets removed --> Replica1gets a tombstone
          • 3: File1 gets added to DatasetY (which has a rule)
            • As the Judge evaluator was hanging, Replica1never got the tombstone removed
          • 4: As RSEA is full, Replica1A gets deleted within 3-4 hours
        • This was the first time the evaluator was hanging since 2014
        • The workflow was adapted to not remove DatasetX immediately, but give it an expiration of 6h
        • Discuss if #1578 should be implemented, which would prevent the reaper from deleting data if it detects a backlog in the evaluator
          • --> Go ahead with the check
    • 3:50 PM 4:20 PM
      Developers roundtable 30m
      • Custom configuration / policy / permission files #533
        • Currently part of the repository & package
        • Should be removed from main repository and made simple for users to add
        • Load the module or path in the configuration (Can be path or module)
      • Protection of sources by the reaper #1637
        • Currently the reaper is potentially over-protective of source replicas
        • Changes possible/necessary
          • If 1 source, do not allow deletion
          • If 2 sources or more, always keep the alphabetically first one
            • Issue for alphabetical selection?
              • Should not be, as replica becomes eligible after transfer finished
      • Python 3 plan #1505 and #67
        • Via pyline --p3k
        • Hannes will prepare a script
      • Rucio sometimes returns a 404 for a list-replicas call, even if the file exists #1568
        • Rerun arcls against integration service and try to find out based on the logfile
      • Message payload #48
        • Add new column (CLOB) and only write messages in there
          • Only context switch when the non-clob column is empty
      • ZIP Files #1091
        • Open ticket for list_replica inconsistencies
      • Conveyor-consumer wrong handling of multi source jobs #704
        • In preparation
      • Kubernetes
        • traefik released now, needs to be integrated at CERN
        • For WebUI minimal work, for authentication server more complex
      • Database full for DOMA instance
        • message_history, rules_hist_recent, requests_history
      • Nagios probes #1638
        • Need to evaluate what we want to use instead nagios?
        • Should be a separate repository for probes (maybe a common one)
        • Also probes which create network metrics
    • 4:20 PM 4:30 PM
      AOB 10m
      • Development meeting schedule for 2018
        • Development meeting every week except
          • 15. Nov 2018 - SC'18
          • 13. Dec 2018 - ATLAS Software & Computing week
            • Maybe CMS people can join ATLAS DDM Session?
          • 20. Dec 2018 - pre CERN-closure ???
            • People might be on vacation already - will check closer in time
          • 27. Dec 2018 - CERN closure
        • Any CMS constraints?
          • Computing week in 2 weeks, but CMS guys would like to join Rucio Dev Meeting in person
        • First meeting in 2019: 10. Jan 2019
      • Dev meeting on the 25. Oct 2018:
        • Rucio 1.19.0 "Fantastic Donkey" release planning