CTA deployment meeting

Europe/Zurich
31/1-012 (CERN)

31/1-012

CERN

6
Show room on map

Notes from the meeting:

  • FTS and job chaining ("multi-hop"):
    • Multi-hop requires registering intermediate file in the ATLAS Rucio catalogue.
    • Transfers are a single submission with two transfers. Job status contains information on intermediate files. These are to be cleaned up by Rucio.
    • A potential concern is that one FTS job is required for each file to be transferred. Therefore the number of jobs will be multiplied by several orders of magnitude. Are there FTS scalability concerns? Seems not but needs to be confirmed by FTS team.
    • Andrea will contact ATLAS and propose test plan including setting up an FTS end point.
  • Other FTS items:
    • "activity" propagation to CTA: done
    • "cancel" also propagated. (Initially will be logged by cta-frontend but not processed)
  • Database:
    • DB overview: postponed
    • Encryption key support in DB: 
      • Not required for WLCG instances so not a priority for now.
      • Adding columns to DB is based on confirmed use cases and not because information exists in CASTOR.
  • Garbage Collection:
    • ready for field testing on ATLAS - Steve and Julien to agree on details (done after the meeting)
  • Status of ATLAS agreed functionality:
    • "activities" (Eric) and intra-VO fairsharing:
      • Eric to agree with ATLAS and define list of activities and configure them via cta-admin. CTA drive ls will show activity information.
      • will be deployed on ATLAS on 27/5.
    • FIFO queueing (Eric):
      • will be done at tape pool level.
      • will be deployed on ATLAS on 3/6.
    • Early failing of retrieves:
      • Check tapes for DISABLED and EXPORTED flags being set.
      • (Will review the need for RDONLY at a later moment in time (2020))
    • Cap for parallel writes:
      • Done via "partial tapes" which is the replacement for CASTOR nbdrives. Eric, Germán and Vlado to clarify concepts as these are not completely settled yet.
    • Storage classes and tape pools:
      • Assume O(10) storage classes for ATLAS. German to talk to Alessandro for initial storage classes.
      • CTA ops will map storage classes into tape pools following operational needs.
  • Other functionality:
    • Cancel jobs:
      • Requires additional work in object store (finding file id / name)
      • To be completed by end of Q3 this year 
    • CERN RAO
      • discussion postponed after Repack completion.
  • Repack field test plans:
    • Cédric stress testing on CI. Will set up a small SSD-based EOS instance for caching.
    • Presentation scheduled for next week
  • CTA production setup and website
    • Production setup:
      • Julien to tag and deploy latest EOS/CTA releases
    • Web site:
  • Status of migration tools:
    • ​​​​​​​Working on completing code and schema. Checksums: Adler32 but support for other types will be enabled.
    • Architecture of migration system to be presented by Giuseppe & Michael at next meeting.
    • ATLAS CASTOR cleanup progressing. Waiting for decision on file class split between users and ATLAS.
  • Other actions from previous meetings:
    • CTA GPLv3 licensing: Done (Steve)
    • CMS: deletion of pre-offset files: Ongoing (Julien)
    • Containers for dCache: Current containers are highly CERN-specific and for testing/CI purposes. Germán to point dCache people to build yaml files so that they can build CTA themselves.
There are minutes attached to this event. Show them.
    • 14:00 15:00
      CTA deployment status 1h

      • FTS and job chaining clarifications by Andrea (see previous minutes)

      • DB overview (Steve)
      
      • Status of functionality / setup as agreed with ATLAS (see slides from meeting with ATLAS)
          • “activities"
          • FIFO queueing
          • early failing of retrieves
          • storage classes, tape pools
      
      • Status of other core elements
          • cap for parallel writes
          • CERN RAO
      
      • Repack field test plans and timelines
          • Cédric / Vlado
      
      • CTA production setup and website
      
      • Status of migration tools, namespace split-up, test import
      
      • Status of EOS + XROOT releases
      
      • Other actions from previous meeting
          • CTA GPLv3 licensing
          • CMS: Julien to ask CMS to delete/re-create pre-offset files
          • Containers for dCache