Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

4
Show room on map
Description

Zoom: Ceph Zoom

    • 14:00 14:15
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN) , Fabrizio Furano (CERN)

      Enrico

      • Quiet Xmas break:
        • dampe.cern.ch filling up root partition of release manager (RQF1716035)
        • atlas-nightlies struggling to publish big (>100GB) transaction
      • TCP_REFRESH_FAIL_OLD on front caches (CVMFSOPS-245) has not showed up again
      • ams.cern.ch hammering squids since yesterday ~8pm
    • 14:15 14:30
      Ceph: Operations 15m
    • 14:30 14:45
      Ceph: Ongoing Projects 15m
      • Kopano/Dovecot 5m
        Speaker: Dan van der Ster (CERN)
        • Kopano is now off, so will coordinate with MAlt team to make sure all data is deleted from the kelly cephfs.
      • REVA/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
    • 14:45 14:55
      S3 10m
      Speakers: Enrico Bocchi (CERN) , Julien Collet (CERN)

      Enrico:

      • Expired TLS certificate for nomad:4646 on ceph-consul-cf4a014e3b (https://security-issues.web.cern.ch/incidents/16544)
        • New certificate deployed on Oct 25 (approx 60 days before expiration) but daemon not restarted
        • ceph-consul-cf4a014e3b + ceph-consul-4ae7d67d3a (certificate re-created on Nov 25) rebooted this morning. The new certificate is now used.
        • ceph-consul-0f361e41b0.cern.ch untouched. Certificate expires Apr 14 and we should have the new frontends by then.
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN) , Enrico Bocchi (CERN) , Theofilos Mouratidis (CERN)
      • Levinson has rather high metadata iops: EOS/Ceph test cluster background data scrubbing. Andreas disabled end of last year, and will add a throttle.
        • This metadata tuning ticket is nearly good now: https://github.com/ceph/ceph/pull/38574
    • 15:05 15:10
      AOB 5m

      Ceph After Party:

      • New for 2021:
        • Cluster leads: each of us will take lead responsibility for a set of clusters. Monitoring usage, performance, issues, planning upgrades, capacity needs, data migrations, failure domains, hw replacement, etc.
          • Cluster lead: Daily checklist
          • Group: Weekly cluster Reviews (CEPH-1025)
          • Teo: cta, erin, kelly, levinson

          • Enrico: beesly, gabe, nethub

          • Dan: dwight, flax, kopano, jim

          • CFCCM (Enrico to start joining)

      • JIRA scrub