2 volumes missed twice in a row - triggered mail. Cleaned itself? Need to check how many actually miss the daily backup, classify (systemantic/random errors), then see whether we can either fix stuff or lower the warning threshold [Kuba].. SLD is "daily backup"
one instance of the "uuuafs" user not being known to CASTOR -> Xavi
"corrupted"=="0-size warning mail sent, but found that the backup volume actually was present (and non-0-size). Might have been a really slow "vos backup" (known problematic volume) that was simply still ongoing at the time of the check.
AFS DB server update to 1.6
seems to not have had side effects. ATLAS web server (webafs) was down - several tickets, but that was apparently a web-side overload.
AFS PTS maxid == MAXINT (and subsequent reset) last week - probably unrelated to the AFSDB testing, Instead saw odd messages about a machine trying to join UBIK voting? IP was some IP telephony reserved address, might have been a compromised machine or spoofed IP. Seems to have been rejected but might have had some side effect?
next: puppetification
Monitoring (/u/ops/monitor, LEMON):
new /u/ops/monitor running in test mode on afs257 (parallel to old) - removed SURE and deadwood. If OK: will strip out overlap with LEMON (disk full = have new exception; IPMI), then review missing functionality (offline volumes)
need to check whether the remote test (running on afsdb3?) also uses SURE [Jan]
replace with LEMON remote test (criss-cross "udebug")
(no news regarding AFScm thing on CC7 etc)
VM quota:
still no new VM quota. But can already now request VM on critical power, makes sense for AFS. Asked for new "critical "tenant+quota (on Bernd). Will couple with CEPH-on-critical (needs to go first to ISM) and slowly look at virtualized AFS fileservers
There are minutes attached to this event.
Show them.