no blocking errors but system still needs babysitting to understand/resolve numerous problems
gfal bug created a lot of noise (now fixed)
root cause of some errors lost in the noise
instrumentation will be improved for 2019 recalls. We are benefitting from the pause between processing each run (a luxury we won't have in production)
some diagnostic and devops tools still missing (e.g. "cta-admin showqueues" does not show popped jobs)
We are diagnosing problems and providing help to the rest of the group (EOS and FTS teams). In many cases we are reading the source code and contributing the fix.
16:10
→
16:20
Putting EOSCTAATLAS into production10m
Milestones:
"CTA Release v1.1" 31 January
Complete recall test. CASTOR will be restored as the ATLAS endpoint to allow pending calibration data to be written to tape.
Write stress test: 24 February
do we need multi-hop for this? To be checked with Cédric.
Online integration test: (2 March)
One week "cool off" period with no writes to CASTOR, to ensure all files have made it to tape and to check that no further data is being written
ATLAS goes into production and CASTOR files are migrated: date provisionally 16 March (check this does not clash with ATLAS TDAQ milestone tests)
16:20
→
16:30
Communication10m
Logo
Website
Upcoming talks:
EOS workshop: next week
ATLAS software week: 10-14 Feb
ITUM: 17 Feb
IT/ATLAS coordination meeting
16:30
→
16:40
Plans and staffing needs10m
Responsibilities and knowledge sharing :
CTA software: Frontend / Catalogue / Tape Server / Objectstore
Devops: hardware / systems integration / monitoring