Rucio Development Meeting
→
Europe/Zurich
Martin Barisits
(CERN)
-
-
15:00
→
15:10
News 10m
- Full Tele-Working at CERN (Safe Mode)
- Rucio Workshop
- Given the COVID-19 situation and the many remote participants the workshop went very well
- 80 participants
- 29 presentations
- Coding Camp
- Was cancelled
- Hopefully we can have another one later this year
- Release schedule
- 1.21.12 was released on Monday
- 1.22 "Green Donkey"
- Code Freeze since Monday until ~Mar-27
- 1.22.0rc1 Today
- More RCs as needed
- Friday Mar-27 1.22.0 final
- 1.22.1 patch release on Apr-06
- Next week 1.23 "The incredible Donkey" release planning
- New CERN technical student
- Ben
-
15:10
→
15:20
News from the experiments 10m
- ATLAS
- EOS CTA tests from SFO
- New HTTPD server configuration
- Was running to prefork MPM mode, switch to event MPM mode
- Better performance
- Issue: Multipl mod_wsgi containers with separate session pools
- Not well handled in default config
- Added wsgi_groups and linked all endpoints
- Shared connection pool accross all REST endpoints
- Much better performance
- Suggestion: Make this the default!
- Was running to prefork MPM mode, switch to event MPM mode
- CMS
- Successfully test multihop
- Issue
- For multihop 2nd transfer is submitted (and fails) before the first one is finished
- Possible FTS bug?
- Cedric will have a look and test it on ATLAS again
- MultiVO
- DB schema upgrade is merged
- Belle II
- Started to work on chained subscription mode
- First prototype next week
- Globus Online
- Reaper development for GO
- Deletion enhancement (2 phases?)
- Testing to start with ATLAS
- Lessons learned then applicable for light sources too
- LDMX
- Rucio to be used as a data catalog/metadata catalog
- Add more operators (more than equals) to metadata queries
- ATLAS
-
15:20
→
15:30
Hot topics 10m
- Globus Online Testing
- Support implemented in Rucio
- Needs testing within ATLAS
- Start to test transfers first, later on also do deletion
- Currently ATLAS:
- Harvester manages transfer to Globus endpoint
- Would like to use Rucio to manage these transfers since US HPC facilities might move away from offering GridFTP endpoints
- Questions/Issues/Ideas
- Authentication
- Each transfer has its own refresh token
- Matt: There is a forever token tied to the auth_client - So no individual tokens needed
- Need to handle Globus credentials (Safe way to inject/store)
- How to handle this in the Multi-VO world?
- Right now: Different submitters for each VO
- Each transfer has its own refresh token
- Integration Testbed
- Need to populate it with some data
- Should be integrated to Panda testtools as well
- 2 Endpoints we can use: SLAC and BNL (dual use)
- No GLOBUS only endpoints yet
- Will setup daemon on ATLAS integration cluster
- Transfer/Scale testing
- Once we are happy with transfers, also test reaper
- Need to plan for 2 rules
- WLCG <-> EDGE <-> GLOBUS
- No multi-hop for this (Yet?)
- EDGE node supports both WLCG world and GLOBUS world
- Transfer limitations in GO
- 100 transfers concurrently
- Need to bulk them (Is there a bulk limit? 1Mio per transfer?)
- Can submit more, GO will queue them?
- Monitoring
- Need to get the transfer events for GO also into the CERN monit stream
- Authentication
- Further communication and followup on Slack: https://rucio.slack.com/archives/C8Z07UWKV
- Globus Online Testing
-
15:30
→
15:55
Developers roundtable 25m
- Issue burn update
- Rucio 1.22 "Green Donkey" priority development followup
- In Progress
- Kubernetes [Thomas]
- Got the new network controller! Single-Zone calico network controller (Nodes need to be in one availability zone though)
- Was mapped manually by Ricardo
- Improved the queries, but still not the same as puppet servers
- With changes of MPM event mode it gets quite close to the puppet servers
- Problem with the old cluster to get logs out of the server (network was too slow)
- CERN IT team working on multi zone network controller
- Once these issues fixed then we can move more services to
- Got the new network controller! Single-Zone calico network controller (Nodes need to be in one availability zone though)
-
- Upload migration PR is open since some time, probably needs to be re-submitted for testing
- Unification of metadata #3096 [Aris]
- Rucio core part is mostly done
- API/CLI work needs to be done
- Re-use existing calls
-
- Kubernetes [Thomas]
- ToDo
- Documentation [Martin]
- rucio.cfg vs config table #2630 [Mario]
Versioned history tables need explicit definition #2063 [Martin]- Handling of Archives in Reaper #1431 [Cedric, Thomas, Mario, Martin]
- Missing features for Archives #1091 [Mario, Martin]
- Documentation page listing config table and RSE attributes #2631 [Mario]
- Operators doc and repo #2636 [Martin]
- Dimitrios hopes to start an effort mid-february
- Some ATLAS parts might be required to stay confidential
- Dimitrios hopes to start an effort mid-february
- Done
- MultiVO #2635 [Ian, Eli, Patrick]
- Open PR works fine now; Will be merged to coming feature release
- Andrew prepared a few other things which can be improved and submitted incrementally (probably)
- Repository cleanup #5252 [Martin, Thomas]
- Configuration / Permissions / Policy #533 [James]
- Cleanup of Issues [Everyone]
- Reaper 2.0 #2412 [Cedric, Thomas]
- Protection of sources too strict in reaper #1637 [Cedric, Thomas]
- OIDC #2612 [Jaroslav]
- Reaper should stop in case of judge backlog #1578 [Martin]
- Add file to content history in reaper #37 [Cedric, Thomas]
- Done, but one bug to fix due to contents_history partitioning
- MultiVO #2635 [Ian, Eli, Patrick]
- In Progress
- Auditor
- Contacted by CMS who want to continue on Auditor work
- Dimitrios and Thomas will try to give pointers, probably should re-write some parts of the auditor into an auditor2
- Third party copy read and write needs to be enabled [Martin, Cedric, Dimitrios]
- Duplicate the information for both operations in the probe for now
- Need to iterate with AGIS to get TPC_write and _read in there
- No news
- CRIC progress [Panos]
- Review probes [Thomas, Dimitrios]
- Dimitrios needs some help with reviewing the probes, since he is overloaded
- Thomas will help with reviewing
- No news :-)
- RPG development [Tomas]
- Tomas will write document about project definition for RucioRPG
- Needs more discussion
Issue with Reaper2 which unnecessarily deletes secondary replicas due to replicas in wrong statestaging_startged_at staging_finished_at working now?Need to check
-
-
15:55
→
16:00
AOB 5m
-
15:00
→
15:10