CTA deployment meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
Michael Davis (CERN)

Plan to put EOSCTAATLAS into production/migrate ATLAS from CASTOR to CTA

ATLAS Production Instance

  • See Reference Platform document.
  • Spaces: SSDs split into two physically separate spaces for ARCHIVE and RETRIEVE.
  • Layout: Single replica layout.
  • EOS converters will be DISABLED.
  • 1 FST per SSD device. 16 SSDs per disk server, but one is reserved for QuarkDB. i.e. 15 FST daemons per disk server.
  • CTA Frontend will be configured to accept maximum file size of 20 GB. Physical maximum limit using single layout is the size of one SSD (2 TB). ATLAS say their maximum file size coming from the DAQ is 8 GB. Setting a maximum limit of 20 GB is plenty of margin and will give us a warning if ATLAS file sizes drastically change.
  • There will be one disk-only directory, /tmp. All other directories that users can write to will be tape-backed.

Initial Configuration

  • 2 SSD disk servers: 2×15 SSDs×2 TB = 60 TB total (30 TB for archival/30 TB for retrieval). Max throughput 5 GB/s.
  • 15 tape drives dedicated to ATLAS (out of ~30 CTA tape drives): 5×Lib4, 5×Lib3, 5×LTO
  • Confirm with ATLAS how much data we expect to receive from them this year. (We expect 20 PB total/estimate ~5 PB from ATLAS. TBC.)

To Do List

  • Ceph objectstore
  • Oracle DB
  • QuarkDB
  • CTA Frontend
  • Repack instance with enough disk to repack up to 5×15 TB tapes in parallel

Documentation, Procedures, Announcements

Julien: add the following documentation to the CTA Tape Ops website:

  • ATLAS production instance: Hardware & software configuration
  • Workflow definitions. Are these going in /proc or in the target directories?
  • ACLs. Where will the !u not-updateable ACL go?
  • Backpressure configuration. Backpressure values TBD.
  • Michael: SSB announcement & e-mail announcements to ATLAS.

Monitoring

  • David: Configure CTA Service Availability metrics
  • David: Check links to Service Portal
  • Create test tickets in Service Portal

Final Testing & Commissioning

  • Julien: Test backup & restore of QuarkDB
  • FTS: check versions of ATLAS production FTS instances
  • Concurrent archive/retrieve/delete test
  • Backpressure test

Schedule

  • w/c Mon 25 May: test Fabrice's solution to the metadata rate problem.
  • Mon 1 June: Hardware instance in place. Monitoring hooked up. Tests of instance and monitoring.
  • Mon 8 June: Stop writes and metadata updates to CASTOR ATLAS. Deadline to complete ATLAS repack campaign.
  • Wed 10 June: Test ATLAS migration. Fix DB configuration.
  • Thu 11 June: Migrate ATLAS metadata from CASTOR to CTA.
  • Fri 12 June: Final tests of the CTA instance.
  • Mon 15 June: P-Day. CTAEOSATLAS is live.
  • Mon 29 June: status update at ITUM
There are minutes attached to this event. Show them.
    • 14:00 14:20
      Putting EOSCTAATLAS into production 20m

      Provisional schedule

      • w/c 25 May: SFO/EOS stress tests (coordinated by Maria)
      • w/c 25 May: test Fabrice's solution to the metadata rate problem (Julien)
      • w/c 8 June: stop writes and metadata updates to CASTOR ATLAS
      • 12 June: migrate CASTOR ATLAS metadata to CTA
      • 15 June: P-Day switch off CASTOR ATLAS

      ATLAS Platform

      • Based on the same platform that we used for the tests in Feb/March: single replica
      • SSDs split into physically separate "archive" and "retrieve" spaces, see Reference Platform document
      • List all other differences from the platform we used for testing
    • 14:20 14:25
      AOB 5m