CTA deployment meeting

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map

Tape Lifecycle:

  • See https://gitlab.cern.ch/cta/CTA/blob/master/doc/GeneralTapeLifecycle.md
  • castor-tape-label won't be ported.
    • Rather, tape ops will produce a script directly using dd. What about LBP support? Setting drive into LBP mode can be done using sdparm.
    • Dedications are not required initially for this use case as alternatively, labelling drives can be put down. 
    • CTA core commands required for checking whether a tape is empty, mount/unmounting cartridges, bringing drive down. Vlado needs to check whether all tools needed are available from CTA core.
    • LBP and missing commands (if any) to be discussed offline between Vlado, Steve, Eric
  • Supply logic:
    • Supply logic will be implemented outside CTA (ops script) 
    • Additional information required in DB (what are the supply pool(s) of a given tape pool)
    • Details to be discussed offline (Vlado, German, Steve)
  • cta-admin test/verify:
    • these are mock-ups, to be removed by Steve
  • Repair workflow:
    • tape-repair: re-injecting a file to CTA will be done via Repack (creating the file in the EOS repack instance). This functionality needs to be documented (Cédric)
  • tape-drivetest:
    • Currently does two things: Checking that CASTOR is OK and that the drive is working OK. In CTA ops, the corresponding script should focus on ensuring the drive is working OK.
  • Reclaim:
    • When cta reclaim will be invoked, all still remaining tape information (logically deleted, superseded files etc) shall be persistently removed from the DB. There won't be an "undo" operation for reclaim. The tape lifecycle workflow shall take this into account.
  • Dedications:
    • drive and tape dedications will be required for full production but are not a priority right now as there are workarounds for the current use cases.
  • Documentation:
    • Twiki is dead. All CTA operational documentation should be moved to Gitlab.

 

Migration:

  • See slides attached to the agenda page.
  • Migration will be tape pool after tape pool. An intermediate table will contain everything for the tape pool being migrated
  • Oracle DB's must be physically collocated (being discussed with Nilo)
  • Files will have an identical fileID in CTA and EOS - the CASTOR file ID.
  • Migrated CASTOR tapes will be marked as EXPORTED which will block recalls. 
  • CASTOR migration routes will be removed, to avoid new files to be created on the corresponding CASTOR instance.
  • Scripts are completed to 90%. gRPC insertion code is still being finished (Andreas).
  • ATLAS hasn't decided yet on the CASTOR namespace split. Michael to inquire with Cédric Serfon.
  • The actual migration is planned to take place once the new SSD-based disk servers are in place. This is likely to happen in September/October.

 

Repack  testing:

  • CI stress testing: still leaving dangling files (500+1500 files), being investigated by Cédric. Some minor bugs such as ensuring that tapes are FULL before processing, handling dangling files, etc. being fixed.
  • After 200K files (on a single tape), performance drops but this is not critical for the time being. Implementing a fix for this will likely require additional sharding. 0.4% of the LHC tapes have more than 200K files, average is 12K files.
  • Repack will take care of the EOS repack buffer below the configured directory prefix. This includes creating the neccessary directory sub-structures, checking for sufficient buffer space, cleaning up migrated files, etc
  • Vlado will follow up with Cédric regarding enabling repacking of disabled tapes (required while dedications do not exist)

 

Actions:

Actions
who what by when
German Storage classes definition with Alessandro D.G. 23/5
German

point dCache people to CTA yaml build files

17/5
Andrea Set up FTS end point for testing multi-hop

23/5

Eric Agree with ATLAS on list of "activities" and configure via cta-admin. Deploy "activities" on ATLAS 27/5
Eric  Deploy FIFO queueing on ATLAS 3/6
Julien CTA web site - add CERN instance description and links to monitoring 23/5
Vlado, Steve, Eric LBP and missing commands (if any) for labelling 23/5
Vlado, Steve, Germán Supply pool handling and required DB information 23/5
Cédric Document injection of repaired files into Repack 23/5
Michael Enquire with Cédric S. on namespace split-up 23/5
Vlado follow up with Cédric regarding enabling repacking of disabled tapes 23/5

 

 

 

There are minutes attached to this event. Show them.