Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
    • 14:00 14:05
      CVMFS 5m
      Speaker: Enrico Bocchi (CERN)
    • 14:05 14:10
      Ceph Upstream News 5m

      Releases, Tickets, Testing, Board, ...

      Speaker: Dan van der Ster (CERN)

      Dan:

    • 14:10 14:15
      Ceph Backends & Block Storage 5m

      Cluster upgrades, capacity changes, rebalancing, ...
      News from OpenStack block storage.

      Speaker: Theofilos Mouratidis (National and Kapodistrian University of Athens (GR))

      Teo/Dan:

      • flax bluestore conversion ongoing. Now using a faster technique (stop, zap OSDs, don't wait to drain before recreating as bluestore).
      • One osd showing very high usage even though it has few PGs.

      Dan: 

    • 14:15 14:20
      Ceph Disk Management 5m

      OSD Replacements, Liaison with CF, Failure Predictions

      Speaker: Julien Collet (CERN)

      Julien:

      • Couple of disk replacements in beesly (ops guide update?), osds are still to be recreated. 
      • Couple of disk failing in erin:
        • p05972678u44402/sdv: disk is failing but prophetstore doesn't seem to think it should be changed now...
      • Presentation to repair-service on new disk replacement procedures:
        • Goal is to offload disk replacement procedures to them
        • FDO to provide helper scripts that facilitates the procedure
        • Paul and Remy will be testing the proof testing the scripts

       

    • 14:20 14:25
      S3 5m

      Ops, Use-cases (backup, DB), ...

      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (Universidad de Oviedo (ES))

      Julien/Roberto:

      • Set up cosbench to measure actual S3 performance
      • In the process of setting up a couple of VMs that we remain only for benchmarking purposes
        • e.g. Evaluation of performance variation post updates

      Dan:

      • reshard rm-stale-instances finished on cephgabe0. It removed ~10 million index objects -- still 13million leftover. 
    • 14:25 14:30
      CephFS/Manila/FILER 5m

      Filer Migration, CephFS/Manila status and plans.

      Speaker: Dan van der Ster (CERN)

      Dan:

      • With ceph/kelly on mimic, I started testing more the cephfs snapshots. There are a few options how to integrate with Manila:
        • Manila has a snapshot feature -- users can create a snapshot from the manila API.
        • We could create "ZFS-like" auto snapshots on all manila volumes (e.g. hourly, daily, weekly, ...)
        • To be tested: how would user created snapshots interact with auto snaps
        • To be tested: at which level should we autosnap? Cephfs-wide, or individually for all volumes.
    • 14:30 14:35
      HPC 5m

      Performance testing, HPC storage status and plans

      Speakers: Alberto Chiusole (Universita e INFN Trieste (IT)), Pablo Llopis Sanmillan (CERN)

      Benchmark results on CEPH /bescratch @cern: https://gistpreview.github.io/?a8fbb37b6d07f841297fcce9500ccdbe

    • 14:35 14:40
      HyperConverged 5m
      Speakers: Jose Castro Leon (CERN), Julien Collet (CERN), Roberto Valverde Cameselle (Universidad de Oviedo (ES))

      Jose:

      • Tunables seem to not have an effect on performance, shall I put the default flags everywhere in Cinder?
      • Fast-diff can be enabled then later on any old image, it may require an object rebuild later
      • Preliminary tests with client caches didn't spot any performance gains (weird)
        • Julien will check different configurations of the cache more extensively
        • In case the client cache does not make a difference, we could try to increase the OSD memory limit
    • 14:40 14:45
      Monitoring 5m

      Julien:

      • Prophetstore monitoring spans now all the EC rows of erin.
      • Trial period extended

      Question: the prometheus alerts seem to flap on and off? Is there some config to fix this?

      Meeting after meeting for KPI/SLI dashboard with L. Magnoni.

      Roberto:

      • Repeating Firing/Resolved alerts on cepherin due to active mgr flapping. (will enable/disable prometheus module to see if helps)
    • 14:45 14:50
      AOB 5m