AFS weekly meeting
31/1-012 (CERN)
AFS service/operation meeting
AFS weekly meeting 2014-09-27 16:00 (exceptionally)
present: Dan, Kuba, Jan
Incidents last week:
- afs500 - SAS array lost (and NO_CONTACT, and operator trying to call people). No volumes (yet), so no real impact. Worrying, was about to get readonly volumes. Also: Wigner does not have spare parts (incl no spare disk) for this model, waiting for 4 weeks for replacement. CERN's fault.
- backup restore server config mixup - caused / full on afs253. Need to mark restore server both in /p/adm/ (for the actual partiton) and /p/etc/ (for the clients). Agree - should unify to define in a single place. (
- Dan asked for bigger VM quota on Openstack, again have run out.
- CENTOS Storage SIG - participate?
- linked to us using upstream in CC7 - will start with this, somewhat pessimistic whether all patches can be dropped or pushed upstream
- linked to client patch review - need to do anyway
- not using Ceph (or Gluster) from that SIG, i.e. no inherent interest to participate otherwise
- remaining fileservers to go to 1.6.7cern2 - proposed Wed morning [Jan]
- DBserver 1.6 update - date? [Jan]
- try for next Wednesday morning, need to put in ITSSB/C5. Can delay if required. Kuba: absent several half-days
- need plan (tests done/to be done, backup/rollback., changes to Bosconfig)
- should check 'upgrade guide' (Jan: none found, will check release notes)
- try for next Wednesday morning, need to put in ITSSB/C5. Can delay if required. Kuba: absent several half-days
- AFS at Wigner (linked to afs500 incident):
- need to make sure that cllient contact the correct DB/Fileserver, check "fs getserverpref". Might need explcict config. More of an issue for Meyrin clients accidentally contacting Wigner servers than other way around.
- should inform ITUM, worries about "coffee" talk/reputatioon issues ('AFS is slow'). Jan to send Denise some short statement. Only worried about latency, not RX perf.
- LEMON: some request to take over AFS-related sensors ("core" = afscm). Agree to look at code (incl migrate to own GIT repo), but don't want to run own repo.
- SURE will die. remove "send()" thing from /u/ops/monitor (still used by "access" alarm?). Ideally stop it altogether (and just use LEMON sensors, but no progress on this).
- openafs-1.6.10 test: no progress
There are minutes attached to this event.
Show them.
The agenda of this meeting is empty