UK Rucio F2F meeting
Scope of UK activities
Various sites are doing different things:
- RAL aim to produce a production quality multi-VO instance
- Imperial (Janusz) aim to integrate multi-VO DIRAC with Rucio
- Edinburgh (Teng Li), DUNE monitoring
- Edinburgh (James Perry), object store support
Monitoring
Teng presented his talk from GridPP.
‘System health’ (internal operations of the various components) with Graphite. Tracing calls in Rucio code send stats to pystatsd. Rucio dev team are working on a pull request to replace pystatsd as it will not run under Python 3. Both ‘statsd’ and ‘collectd’ seem to be under active development with recent GitHub commits. Data Transfers/Deletions etc. send info into Apache ActiveMQ (Java code, but not Java Message Queue or JMS)
AENEAS / ESCAPE - Experience, Requirements, Plans and Schedule
Monitoring requirements:
- visual representation of functional tests. Provide info to determine why are transfers failing e.g. to a particular site – has the link gone down or is there a protocol mis-match.
- Show levels of RSE usage. The WebUI offers this, but it doesn’t appear to work.
- Break down usage by scope or account (e.g. are some users transferring/storing more data than others).
- Check ‘liveness’ of replication (useful for when state is stuck ‘Replicating’ – potential problem with transfer tools/FTS link might not have been generated due to bug?). Rucio doesn’t record this per se.
Functional requirements:
- Permissions – provide isolation within same VO. Some classes of user (part. astronomers) will not wish to divulge their data to others. ESCAPE have development effort to work in this area. One suggestion is scope-local perms.
- Desire for co-location and popularity-based replication – informed by processing stats. (Is sufficient data presently gathered?)
- Desire for a Rucio ‘lightweight client’ – I took this to be something self-contained to just up- or download data e.g. when using a new machine (could clarify with Rohini).
- Workflow Management System integration – create an end-to-end use case.
Problems with FTS links disappearing - might increase timeout on FTS Dev.
Plan to get data from more nodes e.g. Meerkat (SKA pathfinder) – set up rules to move Meerkat data to UK. Suggestion about using Ceph replication – but need to control access to e.g. processed data (hence might be easier to sell Rucio replication with rules, rather than Ceph rep.) MeerKat talk at CERN Ceph day September 2019: https://ceph.com/cephdays/ceph-day-cern-2019/
ESCAPE
WP2 create a data lake. WP5 science analysis platform as a prototype of some of the SRC processing. Will show utility of metadata-based searching.
There is an ESCAPE Rucio instance hosted at CERN. Rohini has own Kubernetes instance for development.
Things they were wondering about:
- Would there be a way to detect accounts looking at data ‘belonging’ to other accounts?
- Does metadata searching require a separate metadata catalogue?
- Meerkat presentation at Ceph Day – S3 syncing IDIA to RAL? How could Rucio use ‘synced RSEs’?
- Is there the ability to move Rucio instance between hosts, e.g. recreate rules – export/import Rucio configuration tables etc.
Multi-VO Rucio
Ian presented Andrew Lister’s slides from GridPP meeting.
RAL aim to have a development instance for multi-VO DIRAC developers to work against.
Upgrade current instance to support multi-VO.
Object Store support
James has had pull requests to allow direct access to object stores accepted. Developers were worried that signing so many pre-signed URLs would use CPU time, however this appears to not be the case.
James is updating the configuration to allow VO specific permissions to be set more easily.
Development plans for integrating Multi-VO Rucio and DIRAC
The draft plan that was discussed at the DIRAC workshop in May 2019 was looked at again:
https://docs.google.com/document/d/1W5F3VZBtt3_J5ST6CadJDzHOjz7LMMAhcipY83wg3Xc/edit
Plan:
- JAnusz is looking at the Rucio File Catalogue Plugin
11th October GridP Technical meeting: Janusz to present proposal of how DIRAC will expect Rucio to behave (https://indico.cern.ch/event/849681/ ).
-
- Explain DIRAC works
- List DIRAC data management commands.
- Ian J to look into the RESTful interface for DMC commands
AOB:
Recent GDB meeting:
https://indico.cern.ch/event/739882/
CHEP19:
Mario presenting: Evaluating user experience Rucio talk from .
RAL: Multi-VO Rucio work.
Rucio coding camp:
https://indico.cern.ch/event/819753/timetable/#20191015
Someone should attend the AENEAS Close out meeting. The UK should always have a presence at the Rucio development meeting