Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
R2 (CERN)

R2

CERN

Videoconference
Ceph Daily Standup
Zoom Meeting ID
707092061
Host
Dan van der Ster
Useful links
Join via phone
Zoom URL
    • 2:00 PM 2:15 PM
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN), Fabrizio Furano (CERN)
      • Fabrizio 5m
        Speaker: Fabrizio Furano (CERN)
      • Enrico 5m
        Speaker: Enrico Bocchi (CERN)
    • 2:15 PM 2:35 PM
      Ceph Operations Reports 20m
      • Teo (cta, kelly) 5m
        Speaker: Theofilos Mouratidis (CERN)
      • Enrico (barn, beesly, gabe, meredith, nethub, vault) 5m
        Speaker: Enrico Bocchi (CERN)

        Meredith, Vault: NTR

        Barn:

        • Updated to Octopus (15.2.14-7) this morning (OTG0066431)
          • Several slow reqs, otherwise uneventful.

        Beesly:

        • Planned upgrade to Octopus next Monday -- OTG0066572
          • Requires MGRs on C8
            • Creating today puppet manifest/hostgroup for MGRs only
            • Same work-around to be applied on Vault and Nethub
          • Hardware replacement (OTG0066404) is on hold for now

        Gabe:

        • Thanks, Dan, for fixing inconsistent PG

        Nethub:

        • Draining of old disks for new HW continues (slowly)
          • We may need to pause at some point and upgrade to Octopus
        • New HW (~16PB total) coming 21Q4 for HA racks. Same cluster or we split?

        ----------

        New HW deliveries:

        • 6 racks to Nethub (replacing HB1-07)
        • Other 9 racks coming to 513

        Updated cluster index with UPS power feed:

        • Beesly std/io1, new capacity, CD27-30, UPS3
        • Vault (std/io1, zone 2), SE04-07, UPS1
        • NewCluster (currently cephadm, std/io1, zone 3), CE01-03, UPS2
        • Meredith (RBD on flash, io2, io3) is on UPS2
        • Suggestion from CF is to have new 9 racks in today's RL-B, UPS1

        IPMI and root console credentials:

        • IMPI never change (unless motherboard is replaced)
        • root pwd changes 24/48 hours after it is read. We can set it, though.
      • Dan (dwight, flax, kopano, jim, upstream) 5m
        Speaker: Dan van der Ster (CERN)
        • CEPH-1119: reminded this morning that some cluster mons still not updated to C8:
          • for VM mons: ai-rebuild while still on nautilus, then upgrade to octopus later
          • for physical: reinstall, or remove the mgr class from hg_ceph/cluster/mon and create some C8 VMs which run the mgr only.
        • CEPH-1228: contrary to what we thought, openshift users *have* noticed the O_APPEND client corruptions because of their usage of non-approved kernel 5.12.7. (Corruption happened in some prometheus db which they simply deleted without contacting us). They are updating to 5.13.10 now.
        • Drupal migration to CephFS seems to be going well. We get a few mds trimming warnings but all seems just transient.
        • CEPH-1229: LinuxSoft had planned a migration from flax to levinson, but this was paused because they require much less quota than available. (Need at least 65TB to all repos and snapshots). The motivation for this move was to improve performance -- we'll review again if a dedicated MDS can achieve the same without needing to move all the data.
        • Two tickets open with LinuxSoft about C8 bugs:
          • efi raid mismatches: INC2926618 (I found a fix -- requires patching grub.cfg)
          • kernel raid warning: INC2926549 (effects all known C8 and CS8 kernels. no evidence that it's not just a harmless warning)
        • Regarding Stream 8 mail:
          • We should restart the migrations to Stream, cautiously at first. All "known" Stream 8 issues impacting Ceph are fixed.
      • Arthur (levinson, pam) 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
        • CEPH-1215 resharded pam OSD
          • Better rbd 4M write performance apparently
      • Jose (OpenStack) 20m
        Speaker: Jose Castro Leon (CERN)
    • 2:35 PM 2:45 PM
      R&D Projects Reports 10m
      • Reva/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
      • Disaster Recovery 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
        • Submitted my draft PR upstream: https://github.com/ceph/ceph/pull/43359
          • mgolub seems happy about the general design choice
          • Will continue to work on testing

         

      • EOS CephFS Test 5m
        Speaker: Roberto Valverde Cameselle (CERN)
      • Monitoring NG 5m
        Speaker: Aswin Toni (CERN)

        Enabled the rbd stats probe on Dwight cluster

        Submitted the Mgr crash fix upstream https://github.com/ceph/ceph/pull/43384

         

    • 2:45 PM 2:50 PM
      AOB 5m