Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
    • 14:00 14:05
      CVMFS 5m
      Speaker: Enrico Bocchi (CERN)

      Enrico:

      • New repos: compass-mc.cern.ch, atlas-nightlies.cern.ch
      • $HOME on CephFS for repos on S3
      • New script to gather repo statistics in prod
         
      • ca-proxy squids VS frontier-squid
      • Documentation for CVMFS volume plugin (RQF1240024)
         
      • Puppet5 roll-out
      [root@lxcvmfs-cc7test ~]# ai-catalog-diff -t devel
      Backuping stable catalog into /tmp/tmp.Cf2tdQxcLa.json...
      Downloading the catalog from a development Puppet master...
      (server: it-puppet-masters-public-a.cern.ch port: 8164)
      Saved catalog to /tmp/tmp.qE0NiIjssm/catalog/lxcvmfs-cc7test.cern.ch.json
      Comparing catalogs...
      (/tmp/tmp.Cf2tdQxcLa.json vs /tmp/tmp.qE0NiIjssm/catalog/lxcvmfs-cc7test.cern.ch.json)
      I, [2019-03-04T13:49:17.389505 #14779]  INFO -- : Catalogs compiled for lxcvmfs-cc7test.cern.ch
      I, [2019-03-04T13:49:18.596911 #14779]  INFO -- : Diffs computed for lxcvmfs-cc7test.cern.ch
      I, [2019-03-04T13:49:18.597020 #14779]  INFO -- : No differences
      Removing temporary files and directories...
      Summary:
          No differences have been found :)
      Suggested next steps:
          [*] Apply the catalog: puppet agent -t --masterport 8164
      
    • 14:05 14:10
      Ceph Rota Report 5m

      Julien:

      • Proposal: Upon reception of "Uncorrectable Sectors" email, drain the guilty device, try to fix it and recreate the osd if the fix worked, ask for replacement otherwise. Draft procedure is being finalised (and scripted). Test device: sdc on p06253939q78941 (has offline uncorrectable sector but not failed yet - waiting for the end of smartctl -t).
      • scsi errors: Disk replaced, OSD ready to be recreated (INC1916035, INC1920905)

       

    • 14:10 14:15
      Ceph Upstream News 5m

      Releases, Tickets, Testing, Board, ...

      Speaker: Dan van der Ster (CERN)

      Dan

      • mimic 13.2.5 final qa testing. expected this week.
      • nautilus RC 14.1.0 release. Final 14.2.0 expected within two weeks.
      • board meeting friday 1 march, notes at: https://pad.ceph.com/p/board-meeting-notes
    • 14:15 14:20
      Ceph Backends 5m

      Upgrades, capacity changes, rebalancing, ...

      Speaker: Theofilos Mouratidis (National and Kapodistrian University of Athens (GR))
    • 14:20 14:25
      S3 5m

      Ops, Use-cases (backup, DB), ...

      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (Universidad de Oviedo (ES))

      From Roberto: 

      We need to define short-term/long-term backup strategy (cephfs)

      • Backup/restore/verification by the user (restic + S3 bucket, some docs already there)
      • Backup/verification managed by us and restore exposed to the user (with our instrumentation). Similar to the prototype that we are testing for CERNBox, no ready yet: cbox-mon

      Ismael Posada (IT-CDA-WF) would be interested to try whatever we have. 

    • 14:25 14:30
      Block Storage 5m

      OpenStack Cinder, Beesly, Wigner Decommissioning, ...

      Speaker: Theofilos Mouratidis (National and Kapodistrian University of Athens (GR))

      From Theo:

      Ceph/Flax: Filestore to Bluestore Conversion: 1st Host (50%)

      Ceph/Gabe: Inventory in progress (next target for conversion)

      CephFS: MDS balancing is stalled, waiting for upstream help

    • 14:30 14:35
      CephFS/FILER 5m
      Speaker: Dan van der Ster (CERN)

      From Dan:

      • HPC: preparing jim cluster for more production usage (moving all hpc homes from flax to jim). upgraded to 12.2.11-0.6. network issue broke the cluster (router line card had to be rebooted). INC1927697 "Very slow network between S513-V-IP193 and S513-V-IP194"
      • HPC 1m
        Speaker: Alberto Chiusole (Universita e INFN Trieste (IT))
    • 14:35 14:40
      HyperConverged 5m
      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (Universidad de Oviedo (ES))

      Julien:

      • Benchmarking work ongoing with Jose, facing configuration issues at the moment. The idea is to assess performance of different kinds of volumes attached to VMs on kelly.

      Roberto:

      • One pool left to be removed from the cluster (tapetest) but Julien Leduc needs to finish the transition to the new pool (tapecta), he will ping me when it's ok to delete it.  
    • 14:40 14:45
      Monitoring 5m

      Julien:

      • Testing of prophetstore ongoing. The installer was a bit broken but now successfully set-up on some test vm before testing in prod (diskprophet dashboard)
    • 14:45 14:50
      AOB 5m