AFS weekly meeting

Europe/Zurich
31/1-012 (CERN)

31/1-012

CERN

6
Show room on map
Description
AFS service/operation meeting

AFS weekly meeting 2014-09-27 16:00 (exceptionally)

present: Dan, Kuba, Jan

Incidents last week:

  • afs500 - SAS array lost (and NO_CONTACT, and operator trying to call people). No volumes (yet), so no real impact. Worrying, was about to get readonly volumes. Also: Wigner does not have spare parts (incl no spare disk) for this model, waiting for 4 weeks for replacement. CERN's fault.
  • backup restore server config mixup - caused / full on afs253. Need to mark restore server both in /p/adm/afsadmin.cf (for the actual partiton) and /p/etc/afsconf.pl==/p/etc/afsconf.sk (for the clients). Agree - should unify to define in a single place. (https://its.cern.ch/jira/browse/AFS-239)
  • Dan asked for bigger VM quota on Openstack, again have run out.

Discussion:

  • CENTOS Storage SIG - participate?
    • linked to us using upstream in CC7 - will start with this, somewhat pessimistic whether all patches can be dropped or pushed upstream
    • linked to client patch review - need to do anyway
    • not using Ceph (or Gluster) from that SIG, i.e. no inherent interest to participate otherwise
  • remaining fileservers to go to 1.6.7cern2 - proposed Wed morning [Jan]
  • DBserver 1.6 update - date? [Jan]
    • try for next Wednesday morning, need to put in ITSSB/C5. Can delay if required. Kuba: absent several half-days
      • need plan (tests done/to be done, backup/rollback., changes to Bosconfig)
      • should check 'upgrade guide' (Jan: none found, will check release notes)
  • AFS at Wigner (linked to afs500 incident):
    • need to make sure that cllient contact the correct DB/Fileserver, check "fs getserverpref". Might need explcict config. More of an issue for Meyrin clients accidentally contacting Wigner servers than other way around.
    • should inform ITUM, worries about "coffee" talk/reputatioon issues ('AFS is slow'). Jan to send Denise some short statement. Only worried about latency, not RX perf.
  • LEMON: some request to take over AFS-related sensors ("core" = afscm). Agree to look at code (incl migrate to own GIT repo), but don't want to run own repo.
    • SURE will die. remove "send()" thing from /u/ops/monitor (still used by "access" alarm?). Ideally stop it altogether (and just use LEMON sensors, but no progress on this).
  • openafs-1.6.10 test: no progress
There are minutes attached to this event. Show them.
The agenda of this meeting is empty