CTA deployment meeting

600/R-001 (CERN)



Show room on map
Michael Davis (CERN)

Getting LHCb into Production

  • Several blockers on XRoot + EOS side, fixes are in the pipeline. Julien will follow up at tomorrow's x-section meeting.
  • HTTP+TPC: we should be able to test transfers with CNAF, in both directions, using delegated X.509 credentials.
  • In parallel, Julien will continue with configuring and testing token authorisation. This is not a blocker for the LHCb migration.
  • LHCb DAQ tests to EOS have started but they are experiencing some issues. They are throttling at 1.5 GB/s per machine, whereas they should be seeing throughput of 10 GB/s. Maria is debugging this with Chris and the networking team. It makes sense for us to wait until these are resolved before we run any tests on our side.

Getting PUBLIC into Production

  • (From IT/SME meeting) AMS are keen to resume testing as they are blocked by the 1m file limit. It seems that the filename structure they use is hard-coded in many of their scripts and would take some time to fix. Those are really two separate issues (the CTA migration and the problems created by having too many files in a directory). This should be followed up in x-section meeting.
  • CAST errors seem to be caused by retries on their side of files which have not yet made it to tape. As the files have the same filename, the retry is failing. Julien is following up, see INC2684046.
  • Repack of r_public_user: 150 tapes to go.

CTA Status Update

  • Cedric will deploy 3.2-1, then 4.0-1.
  • Reclaiming a tape will empty the recycle bin and remove the IS_FROM_CASTOR flag so that it can be added to a supply pool. Files cannot be recovered from the tape after it has been reclaimed.
  • Next, Cedric will investigate issue #930 CTA Frontend crashes after removal of repack request with 15000 files
There are minutes attached to this event. Show them.
    • 14:00 14:10
      Getting LHCb into Production 10m


      • 1-12 February: commissioning tests
      • 15 February: green light, if all tests are OK we proceed. If there are unresolved problems we can postpone. Prepare OTGs.
      • 22 February: disable write access to CASTOR LHCb in preparation for migration. Check all queues are flushed and everything is written to tape.
      • 1 March: disable CASTOR LHCb. Migrate to CTA.
      • 8 March: EOSCTA LHCb in production

      To Do

      • DAQ functional test
      • Finish testing HTTP+TPC (#209)
      • Publish OTGs
    • 14:10 14:20
      Getting PUBLIC into Production 10m

      Ongoing Tests (Vova)

      • AMS: have hit 1m file limit in CASTOR. To be followed up at SME meeting.
      • NA61/SHINE: EOS team are assisting with setting up storage at their DAQ. Luca will contact them to set up a meeting. Follow-up at SME meeting and x-section meeting tomorrow.
      • DUNE: They have a CTA test endpoint but haven't started testing yet. At one point they said they wanted to migrate to CTA in February but there is no rush, they will do a data challenge in 3Q2021. See #213
      • NA62: Test with spinner space, see #72.
      • n_TOF: test offline workflow with spinner space.
      • COMPASS: Recall testing with spinner space.

      TO DO

      • Finish repack of public_user
      • "ALICE-like" spinner space on EOS PUBLIC PPS, to test with NA62 and n_TOF (#161)
      • Knowledge Base article for users to access files in CTA, see #214 Test and document workflow for retrieving user data from CTA
      • Migrate data from legacy experiments (Aleph, Chorus, Delphi, Nomad, Opal, ...).
      • Bartek (NA61) is following up on issue of experiment data in CASTOR /user part of the namespace


      • 22 Feb: migrate NA62 to CTA
    • 14:20 14:30
      CTA Status Update 10m
      • #959 Cedric's departure preparation
      • CTA 3.2-1 to be released this week: superseding superseded (#922), tape lifecycle (#186, #943), max mounts per VO, tape verification hooks (#883)
      • Next: CTA 4.0-1 with v4.0 schema update.
    • 14:30 14:35
      AOB 5m