Repack Status and plans for September production:
FTS:
Input from Michael:
My updates to action list:
1. Namespace split-up: agreed with Cedric 05/06/2019. See summary in last week's slides. Giuseppe to validate that files under /castor/cern.ch/atlas/atlascerngroupdisk are safely stored in EOS and don't need to be migrated. DONE
2. Georgios is doing preliminary analysis which will eventually come up with a set of metrics to allow us to create an economic model of colocation (measure cost/benefit of different optimisation strategies).
New items:
1. Compile EOS version with required changes: gRPC API, new checksum protobuf format. This is done in EOS v4.5.0. (Also includes XRootD 4.10 and prepare request tracking, though these features are not required for migration) DONE
2. Merge CTA schema changes into master. I have rebased my branch on master, made required changes and will complete testing today. Will coordinate with Julien to merge back into master as he plans to do a release before the merge. Aim to have this done by Monday 1/7/2019.
3. Review final DB schema for migration. Deadline 3/7/2019 (before Giuseppe goes on holiday).
4. Create DB migration tools for ATLAS instance: alter schema and convert checksums and uid/gid to new format.
5. Update EOS namespace injection tools to use new gRPC API.
6. Small-scale metadata migration to validate all tools and workflow for the migration, including handling failure modes.
7. Milestone: CASTOR DB to be moved to new hardware. Propose to move CTA ATLAS DB during the same maintenance window. Date to be set by DB team, I believe it is going to be around 15 July, Giuseppe is coordinating with DB team.
8. Milestone: Week 22-26 July: Full-scale ATLAS migration test (metadata only, no tapes). This is a functional and performance test. It will allow us to accurately estimate the time needed to do the real migration and to consider if we need to make any further optimisations.
Update from Eric on backstop/backpressure status (timelines for September to be added):
Feature |
Area |
Status |
Disk system list (C++ struct): Description of disk system: name, regex to match file URLs, URL to query the free space. |
Catalogue |
Preliminary |
Disk system list management: storing and management of the disk system list. |
Catalogue, frontend |
To be done, pending c++ struct definitive |
Support in retrieve request: attach the file system name |
Objectstore |
Preliminary |
Support in retireve queue: Keep track of the file system name for queued requests |
Objectstore |
To do |
Space allocation tracking object: Keep track of the space committed but not used yet, per disk system. |
Objectstore |
To do |
Support in queuing: Classify requests, add info in queue. |
Scheduler |
Preliminary |
Support in popping (the main part): Integrate the querying of the space tracker, possibly the disk system, and requeue the requests in case of failure. |
Scheduler |
To do |
Support in retrieve mounts: keep track of (temporarily) full disk file systems. |
Objectstore+scheduler |
To do |
Support in mount scheduling: skip mounts for which we found no space (sleep the mount 15 minutes). |
Objectstore+scheduler |
To do |
Action list:
who | what | by when |
Eric | Agree with ATLAS on list of "activities" and configure via cta-admin. Deploy "activities" on ATLAS | 27/5 |
Cédric | Implement repacking taking into account disabled tapes and drive dedications | 30/5 |
Julien | Ensure CTA team is copied in exchanges with ATLAS and other experiments. | 24/5 |
Julien | talk to procurement and network people (to ensure all network infrastructure is in place when nodes arrive) | 30/5 |
Michael | Ensure that Georgios gets in touch with Luc to advance discussions on modelling collocation hints and assessing their usefulness. | 30/5 |
Julien/Andrea | Explict stager_rm follow-up | 13/6 |
Andrea | Agree Rucio->FTS metadata format for collocation hints and storage classes | 13/6 |
Eric | propose and discuss with FTS team format how to receive collocation hints (in addition to storage classes and activities) from FTS. | 13/6 |
Julien | Identify what is the right hardware to run migration | 13/6 |