Ceph/CVMFS/Filer Service Meeting

600/R-001 (CERN)



Show room on map
    • 14:00 14:05
      CVMFS 5m
      Speaker: Enrico Bocchi (CERN)
      • Backup is over
        • backup-cvmfs01 is zrep destination for data and homes
        • backup-cvmfs01 snapshots repos on S3
        • cephrestic-backup-01 copies to S3 homes on manila shares
      • Squid proxies spread evenly across availability zones
        • Some of them are old//small
        • Will recreate with 2xlarge machines
      • projects.cern.ch migrated to S3 (uses gateway also):
        • Multi-tenant repo for intelsw and eng-ci
        • Issue with leases confined to subpaths
        • Developers looking into it
      • Working on migration to collectd of custom metrics
        • Experimenting with cvmfs_whitelist_expire alarm
        • Using lxcvmfs-test for debugging
    • 14:05 14:10
      Ceph Upstream News 5m

      Releases, Tickets, Testing, Board, ...

      Speaker: Dan van der Ster (CERN)
    • 14:10 14:15
      Ceph Backends & Block Storage 5m

      Cluster upgrades, capacity changes, rebalancing, ...
      News from OpenStack block storage.

      Speaker: Theofilos Mouratidis (National and Kapodistrian University of Athens (GR))
      • OTG0051602: beesly osd flapping on Weds July 31 morning. Trigger was simulataneous filestore splitting on the critical power hosts. Once the number of objects in a PG xfs dir reaches some threshold, it splits the objects into subdirs. This can take a few seconds on heavily loaded servers, so the osds miss some heartbeats and get marked as down. Immediate fix was to `ceph osd set nodown` so the osds ignore the missed heartbeats and stop flapping. Permanent fix was to increase the filestore split threshold (eca3892cea44cc61a10610b60b105acaae428bd8). This will become a non-issue once beesly is converted to bluestore.
      • In general the latency on beesly is very high (4kB write used to take 10-20ms, now taking ~100ms), so there is little IOPS left for maintenance (like scrubbing/splitting).
    • 14:15 14:20
      Ceph Disk Management 5m

      OSD Replacements, Liaison with CF, Failure Predictions

      Speaker: Julien Collet (CERN)
    • 14:20 14:25
      S3 5m

      Ops, Use-cases (backup, DB), ...

      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (Universidad de Oviedo (ES))
    • 14:25 14:30
      CephFS/HPC/FILER/Manila 5m

      Filer Migration, CephFS/Manila, HPC status and plans.

      Speakers: Dan van der Ster (CERN), Pablo Llopis Sanmillan (CERN)
      • Update on "loaded dup inode" issue: Yan Zheng suggests we run "cephfs-scan-tool scan_links" during next maintenance window. Currently testing on other clusters to get an estimate how long this will take to run.
      • FILER: roughly 50% done moving NFS replicas from wigner to st-048-bbd31dee.
    • 14:30 14:35
      HyperConverged 5m
      Speakers: Jose Castro Leon (CERN), Julien Collet (CERN), Roberto Valverde Cameselle (Universidad de Oviedo (ES))
    • 14:35 14:40
      Monitoring 5m
      • Collectd SMART monitoring was broken because only disks with mounted filesystems were monitored. So all our lvm or journal devices were not monitored. Eric B pushed a change to qa to monitor all devices: https://its.cern.ch/jira/browse/CRM-3240
      • CEPH-738: Writing a new latency probe to better measure the volume write latency (with all wb caches disabled).
    • 14:40 14:45
      AOB 5m