CTA development meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
Michael Davis (CERN)

Issue Backlog

Tool to resubmit jobs from the failed queue

  • The tool to resubmit failed jobs is not production grade but is needed for production workflows
  • #898 Requester ID is hardcoded in cta-send-event tagged as workflow:Accepted
  • Julien has written #993  Reinjection operation tools for operators, to describe what other functionality is missing from the cta-send-event script. Tagged as workflow:Accepted.

Tape Server

  • The tape server encryption implementation is working. Vova has tested all the steps by hand. He will document the full procedure so we can review it.
  • There is a set of tape-encryption tools in https://gitlab.cern.ch/tape-operations/tape-encryption-control. These are needed for this workflow so should be ported to CTA.
  • Currently these tools are packed with other CASTOR tools in the CERN-CC-TapeAdmin-tools RPM. (See note about this RPM on this page: https://tapeoperations.docs.cern.ch/legacy_castor_tapeops_twiki/TapeOperationsCTATools/). The problem is that this RPM contains some CASTOR tools which have been replaced by CTA equivalents, plus some tools which are missing in CTA. We need to identify the tools in this package which are missing in CTA and port/repackage them properly for CTA.

Drive Statuses

  • Space reservation: our understanding is that a drive can only ever have one active disk space reservation. If a queue is sleeping there should be no reservation for that queue. The reservation is made only for an active queue which is being recalled. Therefore there is a 1:1 relation between the drive state table and disk reservation table, so they can be combined.
  • The key:value structure of the drive config table doesn't make much sense from a relational DB point of view but we will keep it for now. This solution is preferred over the binary blob solution as it will allow us to query the configuration of all drives from within the DB, should we want to.
  • The Drive State Pointer table is an ObjectStore implementation detail and can be removed.
  • This leaves us with two new tables: Drive Configuration (updated when the tape server starts) and Drive State (updated at regular intervals or when the drive changes state).

Build, Packaging and Distribution

  • Aurelian has built the CERN-dependency free version of the RPMs. He will create a new "CTA public" repo which will be used to publish these.
  • The RPMs will be the same as the ones we have in our internal repo, but the "public stable" version may not be the same version we are deploying internally.

AOB

  • All drives went down during the DB upgrade on Wednesday; the tape server should be more tolerant of DB outages. Steve created #994 cta-taped daemon should not put itself permanently down when there is a database problem, to remind us to come back to this. (Will not be addressed in the short term).
There are minutes attached to this event. Show them.
    • 09:15 09:25
    • 09:25 09:35
      Tape Server 10m
      • Encryption for CASTOR backup use cases
      • #980 cta-taped can not handle missing encryption key correctly
    • 09:35 09:45
      Drive Statuses 10m
      • #976 Move storage of persistent drive states to the DB
      • #988 Instrument backpressure for retrieve in dr ls
    • 09:45 09:55
      SchedulerDB 10m
      • Unit tests in CI need an external DB
    • 09:55 10:05
      CTA Build, Packaging and Distribution 10m
      • Aurelien is working on cleaning up dependencies in CTA RPMs
      • #771 External access to CTA CI workflow
      • #983 CTA running on public repositories RPMs
      • Codi for discussions
    • 10:05 10:10
      AOB 5m
      • Backpressure - Julien will document how this is being addressed