We upgraded Indico to version 3.0. The new search is now available as well.

Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

4
Show room on map
Description

Zoom: Ceph Zoom

    • 14:00 14:20
      Ceph: Operations Reports 20m
      • Teo (cta, erin, kelly, levinson) 5m
        Speaker: Theofilos Mouratidis (CERN)
      • Enrico (barn, beesly, gabe, meredith, nethub, vault) 5m
        Speaker: Enrico Bocchi (CERN)

        Barn:

        • Upgraded to 14.2.20

        Beesly:

        • Incident Thursday afternoon OTG0063676
        • osd recreation issues not fully understood yet (see CEPH-1139)
          • On hold until patched with new version

        Gabe:

        • Decommissioning:
          • 13 machines disowned to procurement
          • 21 remaining: 4 to drain, 6 used by Arthur, 11 ready
          • 1 monster machine is empty! Other 3 will wait for new version
        • Access logs migrated to new MONIT kafka cluster -- Meeting on Wednesday

        Nethub:

        • One machine drained due to HW intervention (cephnethub-data-fc559c2fdc). Will refill after upgrade.
        • Access logs migrated to new MONIT kafka cluster -- Meeting on Wednesday

        Meredith, Vault: NTR

         

        • All clusters waiting for upgrade to 14.2.20-2
        • Accounting discussion:
          • 1110 users total from radosgw-admin
          • 1002 with uuid-like name, 920 no buckets, 82 with buckets
          • 109 users retuned by `openstack project list --tags-any s3quota`
          • 108 created on S3 directly (e.g., 61 are cvmfs)
      • Dan (dwight, flax, kopano, jim) 5m
        Speaker: Dan van der Ster (CERN)
        • CEPH-1141: 14.2.20-2 built with patches to mgr and mon to avoid negative progress
          • Build is in -testing repos -- only mon/mgr boxes need to be upgraded. (Assuming cluster is already upgraded to 14.2.20).
          • Testing on dwight now.
        • Flax/Kopano both updated to 14.2.20 last week -- seems stable.
        • Dwight:
          • All RJ machines decommissioned.
      • Arthur 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
        • Working on rpmci for ceph packages, should have a working version soon
        • Strestested the MDS pin as per CEPH-1138
          • With 7.5M inode cached I have spotted some slow requests but it got resolved pretty fast and it's pretty rare
    • 14:20 14:30
      Ceph: Operations Tools (ceph-scripts, puppet, monitoring, etc...) 10m
      • CEPH-1139: ceph-volume lvm batch needs to be used with *all* devices as originally created.
        • Bug remains that some LVs for block-dbs appear duplicated in lvs output, but not in lvdisplay. Zapping then recreating the OSD for the duplicated block-db seems to fix it.
      • CEPH-1137: Plan for power outage testing. 
      • CEPH-1124: ceph-octopus-1 and one dwight mon have been upgraded to Stream 8. Procedure is simple and works well.
    • 14:30 14:40
      Ceph: R&D Projects Reports 10m
      • Reva/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
      • Disaster Recovery 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
        • Understood the 4m slowdown on journaling
          • My SSD pool is less performant with this mode
        • Still in discussion with maintainers for my snapshot patch (https://github.com/ceph/ceph/pull/40937)
          • Should land ~soon in master
    • 14:40 14:50
    • 14:50 15:05
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN) , Fabrizio Furano (CERN)

      Fabrizio:

      • Nice brainstorming on the Friday meeting, about miss rates and ideas to scale up the cvmfs caches
      • Sent around the results of the hit/miss rates calculations on ca-mey-frontier-0097bb1c18.cern.ch
        • The results show pretty high miss rates, and a mix of cvmfs+frontier load that sometimes thrashes the cache, with miss rates peaking to >50% (but average low)
        • Of course I can share the analysis proggy (Fab) as I would welcome someone telling me that the results are wrong because of some mistake

      Enrico:

      • Fixed gateway setup for sw.hsf.org crashing due to number of files in the catalog
      • Would like to salvage from decommissioning the machines in ceph/spare (cephgabe-data...) for cvmfs backup  hot-spare
    • 15:05 15:10
      AOB 5m

      Enrico is absent next week