Ceph/CVMFS/Filer Service Meeting

600/R-001 (CERN)



Zoom: Ceph Zoom

    • 14:00 14:15
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN) , Fabrizio Furano (CERN)
      • Migration: One repo left!
        • cms.cern.ch migrated this morning
        • lhcbdev.cern.ch going Thursday 9 am
          • This is not a migration but rather a completely new repository
      • Experimenting with autocatalogs on projects.cern.ch
        • When using publisher + gateway, warnings on too-big catalogs are actually errors failing the transaction
    • 14:15 14:30
      Ceph: Operations 15m
      • Incidents, Requests, Capacity Planning 5m
        Speaker: Dan van der Ster (CERN)
      • Cluster Upgrades, Migrations 5m
        Speaker: Theofilos Mouratidis (CERN)



        • We need a monitoring category
        • Metrictank setup works, installing grafana afterwards and adding metrictank as graphite datasource does the trick
        • Creating a puppet config to create a metrictank config, also can become a module
          • ScyllaDB config -- Done
          • Graphite-web backup config -- Done
          • Metrictank config -- In Progress
        • After creating the puppet config I would start doing benchmarks
          • https://github.com/raintank/fakemetrics tool to create benchmarks on carbon
        • Last thing would be to start migrating once things look good enough
      • Hardware Repairs 5m
        Speaker: Julien Collet (CERN)


        • CEPH-1002: New ceph-volume lvm batch command
          • Is indeed idempotent and handles all possible cases of broken osd procedures.
          • Apparently, it also adds some more explicit error message
          • Would be very nice to have asap
      • Puppet and Tools 5m
    • 14:30 14:45
      Ceph: Projects, News, Other 15m
      • Kopano/Dovecot 5m
        Speaker: Dan van der Ster (CERN)
      • REVA/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
    • 14:45 14:55
      S3 10m
      Speakers: Julien Collet (CERN) , Roberto Valverde Cameselle (CERN)
      • CVMFS is not very happy about the "503 Slow Down"
        • Do we have a handle to not-enable this for the dedicated RGWs?
        • At the moment, the policy is "retry after some time"


      • CEPH-967:
        • S3 quota alerts are in place
        • ceph-guide updated with general procedure (feedback welcome)
      • CEPH-993:
        • After initial testing, the change was pushed to nethub only
        • Need more time to check that this is not too much of a problem for users
          • 50X plot buggy
          • Memory usage doesn't seem to have changed on the hosts


      • CEPH-821:
        • PG num not a power of 2 on gabe data pool. It was also far too many for the swift users pool, so I merged that from 1024 to 64 PGs. This exposed yet another osdmap trimming bug: https://tracker.ceph.com/issues/48212
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN) , Theofilos Mouratidis (CERN)
      • cephjim cluster was powered off for switch replacement.
      • itnfs38b: disk has filled up a few times due to misconfigured MIC user processes writing scratch data to micprojects. Wojciech has today found a culprit. To workaround we had to trim old snapshots which were pointing at 1T of deleted files.
      • filer-carbon: Someone wrote 30GB of new metrics there on Friday. I added 3x new 100GB io1 volumes to the zpool. Now there is ample space for new metrics.
    • 15:05 15:10
      AOB 5m