Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
    • 14:00 14:15
      CVMFS 15m
      Speaker: Enrico Bocchi (CERN)
      • faser.cern.ch and faser-condb.cern.ch delivered (CC7 + S3)
      • One vanilla squid left (ca05.cern.ch) to be retired
        • Directly accessed by 38 clients
        • Mentioned here (already contacted the owner to update the documentation and fixed)
      • Replacing dedicated atlas caches with frontier-squid
      • Sending out emails for migration to CC7 + S3
        • 20 repos for Cloud Refresh campaign
        • Other 12 repos for SLC6
      • alice-nightlies broken due to network + Ceph RBD outage 2w ago
      • Backup Stratum1 (p05151113477545.cern.ch) to be retired
      • How to monitor S3 bucket size and quota?
    • 14:15 14:30
      Ceph: Operations 15m
      • Notable Incidents or Requests 5m

        CVE-2020-1699: I have disabled the dashboard module on clusters where it was enabled (kelly, kopano).
        CVE-2020-1700: we are running luminous, but luminous-beast has the same code-path. ceph-12.2.12-0.4.el7 built in koji with the patch; restarted rgw's.

        ---------- Forwarded message ---------
        From: David Galloway <dgallowa@redhat.com>
        Date: Fri, Jan 31, 2020 at 10:49 PM
        Subject: v14.2.7 Nautilus released
        To: <ceph-announce@ceph.io>, <ceph-users@ceph.io>, <dev@ceph.io>, <ceph-devel@vger.kernel.org>


        This is the seventh update to the Ceph Nautilus release series. This is
        a hotfix release primarily fixing a couple of security issues. We
        recommend that all users upgrade to this release.


        Notable Changes
        ---------------

        * CVE-2020-1699: Fixed a path traversal flaw in Ceph dashboard that
        could allow
          for potential information disclosure (Ernesto Puerta)
        * CVE-2020-1700: Fixed a flaw in RGW beast frontend that could lead to
        denial of
          service from an unauthenticated client (Or Friedmann)

      • Repair Service Liaison 5m
        Speaker: Julien Collet (CERN)
      • Backend Cluster Maintenance 5m
        Speaker: Theofilos Mouratidis (CERN)

        ceph/beesly: critical power nodes converted to bluestore

    • 14:30 14:45
      Ceph: Projects, News, Other 15m
      • Backup 5m
        Speaker: Roberto Valverde Cameselle (CERN)

        (roberto)

        • S3 backup running smooth at good rate (20TB/day). I've set up a script to gently add automatically the remaining users (now ~2.4k left, 12%). I expect this to finish in around 1 month, and at the current rate, probably we will double the S3 space (something around 1.1PB), is this feasible?
      • HPC 5m
        Speaker: Dan van der Ster (CERN)

        ceph/jim was upgraded from mimic to nautilus. So far so good.
        We are occasionally seeing an osdc (osd client) deadlock on the kernel client. Restarting the relevant osd unblocks the client. Thread here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/CKTIM6LF274RVHSSCSDNCQR35PYSTLEK/

      • Kopano 5m
        Speaker: Dan van der Ster (CERN)

        Since there's still no data in ceph/kopano, we should recreate with a default replicated pool, due to https://docs.ceph.com/docs/master/cephfs/createfs/#creating-pools

      • Upstream News 5m
        Speaker: Dan van der Ster (CERN)
    • 14:45 14:55
      S3 10m
      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (CERN)

      CEPH-818: First nodes for ceph/nethub just booting. Testing the SSDs installed by CF, then proceed with the full cluster installation.

       

      Giuliano:

      • CEPH-811:
        • script to list and link RGWs to their hosts
        • next step: puppetize the above so that a simple call to facter does the job
      • Minor update in S3 KB following this INC2294457 (doc-related)
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN), Theofilos Mouratidis (CERN)

      Filer (Dan)

      • INC2303292: A couple EP MIC clients are not able to connect to our filers. Suspect a network issue, but it isn't obvious.
    • 15:05 15:10
      AOB 5m

      (roberto)

      • Migrated some nfs prometheus metrics to our ceph prometheus instance which were in eos prometheus instance. Will migrate some alerts too.