Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

4
Show room on map
Videoconference Rooms
Ceph_CVMFS_Filer_Service_Meeting
Name
Ceph_CVMFS_Filer_Service_Meeting
Description
Meeting
Extension
10896342
Owner
Dan van der Ster
Auto-join URL
Useful links
Phone numbers
    • 14:00 14:15
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN), Fabrizio Furano (CERN)

      Enrico:

      • Network intervention affecting release managers sw.hsf.org, projects.cern.ch (OTG0058482)
      • CVMFS unaffected by network incidents over last week
      • "Transport endpoint not connected" on CephFS fuse mount for cvgrid. Unmounted+remount fixed it, and software updated. Increased puppet run interval to detect it faster
      • atlas.cern.ch migrated to S3 (OTG0058691)
      • Meeting with LHCb on Wednesday for migration of lhcbdev.cern.ch
      • Deleted old unpacked.cern.ch bucket and release manager
      • cvmfs-2.7.4 should be out this week. Fixes catalog statistics on repositories using gateway
    • 14:15 14:30
      Ceph: Operations 15m
      • Incidents, Requests, Capacity Planning 5m
        Speaker: Dan van der Ster (CERN)

        Dan

        • v14.2.11 regressions:
          • OSD crashed in beesly/critical with a "bluefs allocator" crash. It happens when fsync and rocksdb compaction are happening at the same time.
          • Mon is trimming osdmaps according to which epochs the "out" osds have instead of "in" osds.
        • I made v14.2.11-1 with fixes for both, and will start testing on dwight after this meeting. We will need to update mons + osds.
        • Capacity:
          • We are asked to provide a list of machines we can retire in 2021. This will include all beesly/erin/gabe/flax hw, normally.
          • We have one more ssd delivery (for block storage) and one more hdd delivery (also for block storage) coming around October.
      • Cluster Upgrades, Migrations 5m
        Speaker: Theofilos Mouratidis (CERN)
      • Hardware Repairs 5m
        Speaker: Julien Collet (CERN)

        Dan

        • New abrt-addon-vmcore was merged on Friday and now we get mails for every possible kernel or hw error.
          • One machine is generating a mail for every Correctable Error, something we don't replace modules for. So I will try to mute this.

        Giuliano

        • Slightly more quiet week with respect to disk replacements
        • There are 4 beesly osds "waiting"
          • 2x drained but still in place
          • 2x ready to be recreated
        • Ceph Disk Operations dashboard for the tracking of ongoing replacements

         

      • Puppet and Tools 5m
        • Puppet 6 tests enabled for hg_ceph and module-ceph. I tried the ai-diff tool on a few machines and seems OK!
      • JIRA 5m
    • 14:30 14:45
      Ceph: Projects, News, Other 15m
      • Kopano/Dovecot 5m
        Speaker: Dan van der Ster (CERN)
        • First users are migrated to Dovecot already, so we now need to treat "ceph/kopano" like a production cluster.
      • REVA/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)

        Basic tests are being done on the cephfs module.

        Configuration modified to use the cephfs mount instead.

        Uploaded a file as a test user and can locate it within the mount.

        Compiled and run reva client on my local laptop (dev env) and logged in as the same user and listed the file. Download fails for now, might need some config fix because the download uses localhost as url.

    • 14:45 14:55
      S3 10m
      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (CERN)

      Enrico:

      • Checks for puppet6 preparation of it-puppet-module-consul, consul_template, and nomad
      • Both consul and nomad have new releases. We should consider updating (CEPH-954)
       
      Julien
      • New test cluster to see if the upgrade breaks anything when ipv6 is on,
        • upgrade will be performed after this meeting
      • The swift regression is seemingly not affecting us
      • A user complained that one rgw is sometimes slow when pushing to cms-vc-output
        • Could not find anything obvious yet, investigation still ongoing
      • RGWs were put back in the alias following the network intervention this morning
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN), Theofilos Mouratidis (CERN)

      Enrico:

    • 15:05 15:10
      AOB 5m