CTA deployment meeting

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map

ALICE next steps:

  • Space selection: Support for handling multiple spaces and inbound/outbound routes (eg SSD for migrations, spinners for recalls). Cf ticket created by Steve (link). Once implemented, will require testing by Julien+Steve.
  • JBOD disk support: No problem from a software perspective, as we are already running in JBOD mode for SSD's. Will require to define operational procedures for disabling / replacing broken disks and restaging from tape.
  • HSM
    • (Implicit) prepare handling: Contrary to CASTOR, there is no blocking xrdcp behaviour in CTA. This needs to be discussed with Costin; also needs understanding whether ALICE (JAlien and others) do use multi-file prepare requests or not [checked by Steve after the meeting: JAlien does not issue multi-file prepare requests]. Steve, Michael and Eric will follow up. 
    • GC: The MGM GC can be used here. Space support needs to be coded (ticket to be created by Steve). Another question to be addressed is whether support for persistency (after EOS MGM restarts) is required.
    • Buffer occupancy ("df -h" equivalent) by space: Seemingly provided by xrdfs query space (man page), is this fully / correctly supported by EOS? Cédric will investigate.
    • WAN connectivity for disk cache: Not a software issue (already working with EOS)
    • There is no real alternative to HSM for the EOSCTAALICE use case. A theoretical alternative that has been mentioned would be to federate EOSALICE (or EOSALICEO2) with EOSCTAALICE and allow for direct recalls from tape from the latter to the former. EOS does not provide such functionality, in particular there is no support namespace and storage federations that would handle overlay namespaces and transparent data replication between storage elements; in addition, HSM GC would still be needed.

Prepare - single-file and multi-file behaviour

  • Several proposals discussed at length, conclusion captured by Michael (ticket). GFAL adaptations will be needed as well
  • We need to understand the underlying problem causing 1% of queue requests to fail (requests are not sent to the CTA front-end). Julien needs to look on how to consistently reproduce this problem in CI (e.g. by limiting #threads); it did not reveal itself during ATLAS testing. Julien/Steve to further investigate.

Storage class selection and sys. EOS attributes

  • Agreed that CTA extended attributes in EOS should be protected against direct user manipulation by turning them into system attributes (sys.<name>) protected by EOS. Tickets created/updated by Steve:

CHEP slide review

  • Eric's draft slides were reviewed by the team. The draft slides (prior to review) can be found here. The WIP slides can be found here.

RAL testing issues, round table and AOB

  • Postponed to next meeting

 

There are minutes attached to this event. Show them.
    • 14:00 14:45
      ALICE next steps 45m

      Now that we have discussed the specs with ALICE, what are the next steps for supporting and testing:
      - space selection (SSD, disk)
      - JBOD disks
      - HSM
      * implicit prepare
      * LRU (or equivalent!) GC
      - buffer occupancy script
      - WAN connectivity

    • 14:45 15:05
      Prepare - single-file and multi-file behaviour 20m

      As in https://gitlab.cern.ch/cta/CTA/issues/635

    • 15:05 15:20
      Storage class selection and sys. EOS attributes 15m
    • 15:20 15:40
      RAL testing issues and required changes 20m
    • 15:40 16:00
      CHEP slide review 20m
    • 16:00 16:20
      Round table and AOB 20m
      • Repack status
      • Other developments
      • ATLAS Data Carousel F2F attendance