CTA deployment meeting

600/R-001 (CERN)



Show room on map
Michael Davis (CERN)

Notes from CTA Deployment Meetings w/c 31 August

Repacking tapes on CASTOR

Order of priority for repack operations:

1. Finish repacking `atlas_grid`, 31 tapes
2. NA62 create dual copy `na62dual`
3. Repack of public_user to separate `na62` and `*_grid` fileclasses and clean up `/user`

As we have to repack `public_user` in any case, we will try to clean up the `/user` part of the namespace beforehand as much as possible.

See [#878](https://gitlab.cern.ch/cta/CTA/-/issues/878): Clean up CASTOR /user namespace prior to CTA migration

Getting PUBLIC into Production

* Set up a DB instance on CASTOR for use in migration tests
* Set up a second EOSCTA PPS instance that can be used for read-only migration and recall tests
* This instance will be used for ALICE migration test, NA62 migration test and NA62 recall tests

See [#35](https://gitlab.cern.ch/cta/operations/-/issues/35) NA62 tests on EOSPUBLIC PPS and [#54](https://gitlab.cern.ch/cta/operations/-/issues/54) Use of FTS archive monitoring experimental feature

Getting CMS into Production

* CMS 654 TB test complete (28/08/2020). This was a simple test of throughput from EOS CMS→EOSCTA CMS. No multi-hop or m-bit check. Waiting for Katy to return from holiday to get feedback.
* Multi-hop test next week (w/c 07/09/2020).
* m-bit test to be scheduled.
* This afternoon (03/09/2020) Mihai will present his plans for deploying new version of FTS with m-bit check this afternoon.
* Tentatively, we will migrate CMS after LHCb (**February 2021**). However this can change according to the schedule for Phedex to Rucio migration. Potentially we can migrate CMS before the end of the year if they are ready.

See also [#59](https://gitlab.cern.ch/cta/operations/-/issues/59): Handle failed multi-hop transfers where physical and logical filenames are the same

Getting LHCb into Production

* (w/c 14/09/2020): Julien will start work on Dirac+CTA integration with Christophe Haen as soon as he gets back from holiday
* Our preferred solution is to use multi-hop DAQ → EOS LHCb → EOSCTA LHCb and to stage out to T1s from EOS LHCb.
* Christophe said (02/07/2020) "I believe we will do without multi-hop in FTS, as per our last discussion." but later said (24/08/2020) "The implementation for the pit export still needs to be refined, but this will anyway go first to EOSLHCb, so no direct impact on CTA once we are sure that EOSLHCb and CTA can talk to each other." This workflow needs to be clarified.
* TPC to T1s: Transfers to Gridka and the other dCache sites will be possible using XRootD TPC with delegation of credentials. Chris said, "...then nothing will prevent us anymore from using xroot as third party, providing it's deployed everywhere (and I really mean everywhere)". However, it will not be deployed everywhere. CNAF use StoRM and will not support this, they will support HTTP TPC. Christ said (25/08/2020), "For CNAF, since we have nothing special with respect to the other VOs there, I rely on the TPC task force to make it work there." Can LHCb handle some transfers with XRoot and some with HTTP?

See also [Putting EOSCTALHCB into Production](https://codimd.web.cern.ch/Q2d7McJXRkCMU6uI2nO6Ig#)

Getting ALICE into Production

See [#55](https://gitlab.cern.ch/cta/operations/-/issues/55) ALICE instance in production

* Luca will release hardware mid-September
* Test migration mid-September
* **Wed 30 Sept.** `eosctaalicero` instance will be switched off and retired
* **Thu 1 Oct.** Switch off write access to CASTOR ALICE
* **Mon 5 Oct.** Block all access to CASTOR ALICE namespace, begin migration to CTA
* **Mon 12 Oct.** EOSCTA ALICE in production
* CASTOR T0ALICE will be returned to the ALICE pledge

There are minutes attached to this event. Show them.
    • 09:15 09:25
      Repacking tapes on CASTOR 10m

      Status update on repack operations in preparation for migration:

      • ATLAS broken tapes
      • grid-atlas fileclass (~150 tapes)
      • NA62 dual copy (na62dual fileclass)
      • NA62 single copy (public_userna62 fileclass)
      • NA62 /grid namespace
    • 09:25 09:35
      Getting PUBLIC into Production 10m
      • EOSCTA PUBLIC has been put into production ready to receive data for the BaBar data preservation project, see #40
      • Migration of SMEs from CASTOR to CTA, see Putting EOSCTAPUBLIC into Production
      • Notes from 7 August meeting with NA62 are on CodiMD
      • Update on NA62 tests, see #35
      • Migration of NA62 data, see #856
      • Michael and Vova will attend IT/SME meeting this afternoon
    • 09:35 09:45
      Getting CMS into Production 10m
      • See Putting EOSCTACMS into Production
      • Which CMS tapes need to be repacked before migration? (grid_cms fileclass)
      • CMS Write Tests, see #41
      • m-bit check in FTS
      • EOS to CASTOR archiving tools used by CMS?
      • Schedule for CMS migration from Phedex to Rucio
      • Schedule for CTA commissioning tests
    • 09:45 09:55
      Getting LHCb into Production 10m
      • See Putting EOSCTALHCB into Production (CodiMD)
      • Dirac integration: Julien will work with Christophe Haen as soon as he gets back from holiday (14 Sept. onwards)
      • Status of XRootD TPC with delegation. Test TPC copy to one of the LHCb T1s. Michael will attend TPC working group meeting on Wednesday.
      • Can we do a functional test of a XRoot TPC to a LHCb T1 SE before Chris gets back? dCache/StoRM/ECHO
    • 09:55 10:05
      Getting ALICE into Production 10m
      • See Putting EOSCTAALICE into Production
      • See EOSCTA ALICE proposal
      • The document appears to be proposing a change in CTA architecture towards a microservice-based deployment. This is a significant change from how we have deployed CTA up until now.
      • Proposal in its current form is not complete: not all workflows are covered. In particular how to keep metadata in sync between EOSCTAALICE and EOSALICE (e.g. renaming files or directories).
      • Proposal does not have the support of the EOS team and does not yet address their concerns.
      • We can decide to (a) postpone deployment of ALICE until we have time to complete the document, seek the agreement of the EOS team, build a test system and debug it...
      • OR (b) we can deploy what we have to keep ALICE happy for now and revisit this after CMS and LHCb have been migrated.
    • 10:05 10:10
      AOB 5m