AFS weekly meeting
31/1-012 (CERN)
AFS service/operation meeting
AFS operations meeting 2014-10-07
present: Kuba,Dan,Jan
AFS Backup:
- 2 volumes missed twice in a row - triggered mail. Cleaned itself? Need to check how many actually miss the daily backup, classify (systemantic/random errors), then see whether we can either fix stuff or lower the warning threshold [Kuba].. SLD is "daily backup"
- one instance of the "uuuafs" user not being known to CASTOR -> Xavi
- "corrupted"=="0-size warning mail sent, but found that the backup volume actually was present (and non-0-size). Might have been a really slow "vos backup" (known problematic volume) that was simply still ongoing at the time of the check.
AFS DB server update to 1.6
- seems to not have had side effects. ATLAS web server (webafs) was down - several tickets, but that was apparently a web-side overload.
- AFS PTS maxid == MAXINT (and subsequent reset) last week - probably unrelated to the AFSDB testing, Instead saw odd messages about a machine trying to join UBIK voting? IP was some IP telephony reserved address, might have been a compromised machine or spoofed IP. Seems to have been rejected but might have had some side effect?
- next: puppetification
Monitoring (/u/ops/monitor, LEMON):
- new /u/ops/monitor running in test mode on afs257 (parallel to old) - removed SURE and deadwood. If OK: will strip out overlap with LEMON (disk full = have new exception; IPMI), then review missing functionality (offline volumes)
- need to check whether the remote test (running on afsdb3?) also uses SURE [Jan]
- replace with LEMON remote test (criss-cross "udebug")
- (no news regarding AFScm thing on CC7 etc)
VM quota:
- still no new VM quota. But can already now request VM on critical power, makes sense for AFS. Asked for new "critical "tenant+quota (on Bernd). Will couple with CEPH-on-critical (needs to go first to ISM) and slowly look at virtualized AFS fileservers
There are minutes attached to this event.
Show them.
The agenda of this meeting is empty