Cédric expects to finish the schema validation tool by end of next week.
The management monitoring must be in place before we go to production. Other operations monitoring: we have the essentials, more monitoring can be added later according to demand/use cases.
Management monitoring consists of 2 plots:
Volume sent to tape with data in Grafana / InfluxDB both for CASTOR and CTA.
Data in CASTOR. This shows the amount of ‘live’ (undeleted) data in CASTOR, extracted from the Castor NS. An equivalent plot needs to be created for CTA (extracting information from the CTA catalogue), and as above, we also need an overview plot with the sum of CASTOR + CTA data.
We agreed that the statistics for the "Data in CASTOR" plot should not be part of the CTA schema, because non-CERN sites will do monitoring differently to us. It will be stored in the existing MySQL DB which is already used for CTA monitoring. (Currently it creates only one table, populated by David's code).
(Note from ITMM Minutes: Regarding retention periods, note that all log data sent to the Monitoring Service will be deleted after 13 months unless there is an explicit request to retain.)
The following actions need to be allocated:
We reviewed, prioritized and allocated the "testing" tickets.
Bottom line: we have to repack 42 PB of data before Run-3. The ATLAS portion is ~10 PB (1500× 7TB tapes).
We should not immediately reclaim CASTOR tapes which are repacked in CTA, as this removes the possibility of rolling back the files to CASTOR. After the migration there will be a moratorium on reclaiming tapes for at least a few months.
There are no issues in CTA which are blocking us from restarting the repack tests, but we do need the EOS CLOSEW fixes (see below).
Zero-length files must be allowed in EOSCTA as the experiments use them for tagging directories with metadata. Giuseppe added a procedure to import zero-length files from CASTOR (not yet tested).
FTS should report zero-length files as "safely on tape" to avoid workflow errors. We need to define what a valid zero-length file is: there is currently a difference between files created with CREAT and files created with touch (ADLER checksum can be zero or 1).
Eddie reports that all WLCG sites have been requested to upgrade their dCache instances to version 5.2.* and to enable Storage Resource Reporting (SRR) before the end of March 2020.
From the WLCG Ops minutes:
17 sites are already running version 5.2.*, 25 to be upgraded
SRR still to be enabled at all sites. Only JINR enabled it.
This week all sites will be ticketed, either for upgrade to 5.2.* or for SRR.
Reminder: section lunch is Tue 17 Dec 12.00