CTA deployment meeting

Europe/Zurich
31/1-012 (CERN)

31/1-012

CERN

6
Show room on map
Michael Davis (CERN)

Schema Validation Tool

We agreed that the Liquibase solution does what we want for the time being. The risk of software abandonment or lock-in is low, because we can easily replace Liquibase with something else.

Actions

  • Cédric: add check of schema version before performing the upgrade
  • Cédric: add an extra column to record if the DB is in an intermediate state ("upgrading"). Do not allow CTA to start if this is set to true.
  • Cédric: add procedure for upgrading the schema to the tape operations website.

 

ATLAS Recall Exercise

CTA v1.0-3 is deployed, EOS and FTS running on XRootD 4.11.0.

Note that we have to use ctaprod for the recall exercise because the source and target DBs for migration have to be on the same Oracle instance, and ctaprod is the only DB schema available on the castor production instance.

Update after the meeting: ATLAS migration was done on Friday/Saturday. Julien ran tests over the weekend. We are ready for ATLAS recall exercise starting on Monday.

 

Other EOS+CTA Testing

The concurrent archvial/retrieval/deletion "mutex" test and Rucio+FTS multi-hop test are in progress.

 

Repack

Five tapes were repacked one-at-a-time.

Next test changed the mount policy to use three drives in parallel. There was an intervention on the library so drives had to be shut down. When they were brought back up, repack restarted but did not complete properly. This was identified as a logic problem in the queuing system when no drives are available.

Note: This problem is a general one which does not only affect repack.

Vlado will continue with tests when drives are available again (after ATLAS recall exercise): 10 tapes × 3 drives; 10 tapes × 5 drives; etc.

David is working on CTA repack automation scripts.

Actions

  • Fix queuing logic when no drives are available (see issues #736 and #737)

 

EOS Issues

  1. Issue with TPC identified as an XRootD bug introduced in v4.11.1. The workaround is to revert to v4.11.0 until a fix is available.
  2. Different XRootD versions and resulting confusion: EOS team have a solution, we need to ensure CTA is consistent with what they are doing.
  3. FST is inheriting from a class in the private area of XRootD, which caused an ABI incompatibility. In principle production software should not rely on private classes. To be resolved.
  4. QuarkDB: there is apparently a version available which does not depend on XRootD. However, we do not want to be the guinea-pigs.
  5. EOS immutable files: it is imperative that we set the immutable file attribute on tape-backed directories. The issue with immutable files must be fixed in EOS before we go into production.

 

AOB

  • Julien will ask for more CI runners so that our pipelines can complete in a reasonable time.
There are minutes attached to this event. Show them.