CTA deployment meeting
Thursday 30 Jan 2020, 16:00
Show room on map
ATLAS recall exercise
2017 and 2018 data has been recalled.
Issues which have arisen during the test :
no blocking errors but system still needs babysitting to understand/resolve numerous problems
gfal bug created a lot of noise (now fixed)
root cause of some errors lost in the noise
instrumentation will be improved for 2019 recalls. We are benefitting from the pause between processing each run (a luxury we won't have in production)
some diagnostic and devops tools still missing (e.g. "cta-admin showqueues" does not show popped jobs)
We are diagnosing problems and providing help to the rest of the group (EOS and FTS teams). In many cases we are reading the source code and contributing the fix.
Putting EOSCTAATLAS into production
"CTA Release v1.1" 31 January
Complete recall test. CASTOR will be restored as the ATLAS endpoint to allow pending calibration data to be written to tape.
Write stress test: 24 February
do we need multi-hop for this? To be checked with Cédric.
Online integration test: (2 March)
One week "cool off" period with no writes to CASTOR, to ensure all files have made it to tape and to check that no further data is being written
ATLAS goes into production and CASTOR files are migrated: date provisionally 16 March (check this does not clash with ATLAS TDAQ milestone tests)
EOS workshop: next week
ATLAS software week: 10-14 Feb
ITUM: 17 Feb
IT/ATLAS coordination meeting
Plans and staffing needs
Responsibilities and knowledge sharing :
CTA software: Frontend / Catalogue / Tape Server / Objectstore
Devops: hardware / systems integration / monitoring
ALICE reprocessing: contention in the disk cache