Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

4
Show room on map
Description

Zoom: Ceph Zoom

    • 14:00 14:20
      Ceph: Operations Reports 20m
      • Teo (cta, erin, kelly, levinson) 5m
        Speaker: Theofilos Mouratidis (CERN)
        • Rebased PR for rctimes
        • Decommissioned EC03 rack from ceph/erin
        • Ceph/erin df at 15% of usage
          • Can process to decommission the rest of the racks
          • Need to reduce pg num as well
      • Enrico (barn, beesly, gabe, meredith, nethub, vault) 5m
        Speaker: Enrico Bocchi (CERN)

        Barn, Beesly, Meredith, Vault: NTR

        Gabe:

        • IPv6 addresses vanishing from s3.cern.ch alias on Friday (9.50 am, 7.57 pm)
        • Rebalancing // Decommissioning:
          • New machine 48x12TB have (almost all) full weight
          • Started draining big machines with no raid 1 system disk (2 out of 4)
          • Other 18 old p05* machines to drain (13 fully drained)
        • Users over quota that still manage to write
          • Examples: LHCb Analysis preservation@193.29%, CSC@103.20%
          • Quota is set and enabled on the account (and used by OpenStack). No quota set on the buckets
          • Tried to reproduce on gabe and nethub with no luck
            ```ERROR: S3 error: 403 (QuotaExceeded)```

        Nethub:

        • Traefik working nicely on cephnethub-data-0509dffff2
          • DNS hammering problem solved with dnsmasq
          • Continue to add machines with traefik (but fix es-ceph first)
      • Dan (dwight, flax, kopano, jim) 5m
        Speaker: Dan van der Ster (CERN)
        • dwight:
          • Updated to 14.2.18 -- smooth upgrade, normal procedure worked perfectly. (yum update did not restart RPM)
            • Using config from CEPH-1107 to workaround loopback binding issue.
          • Enabled 2nd active MDS, automatic md balancing moving hot dirs every 10s.
          • New MD features (CEPH-1110):
            • Proactive caps recall for idle clients:
              • "find /" test on /cephfs-dwight, then idled. The clients need to be idle for an hour for the MDS to start recalling. (This is tunable)
            • Throttle readdir requests for clients that are not releasing caps:
              • Default config does not lead to any throttling. Need to slow down caps recall for it to be triggered. When triggered, the client sees `ls` slowed down, as expected. If throttled too much, this leads to a "slow op" on the MDS.
          • Over the weekend both MDSs leaked buffer_anon, eventually swapping:
        • flax: NTR
        • kopano: NTR
        • jim: slow ops correspond to a user writing at 5GB/s. Bottleneck is not the ssds, network, or CPU. Still investigating.
      • Arthur 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
    • 14:20 14:30
      Ceph: Operations Tools (ceph-scripts, puppet, monitoring, etc...) 10m
      • Dan needs to move the CephFS client session dashb to MONIT.
    • 14:30 14:40
      Ceph: R&D Projects Reports 10m
      • Reva/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
        • Slowly but steadily implementing storage api
          • 18% done
      • Disaster Recovery 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
        • More testing on rbd-mirror
          • snapshot mirroring
          • layering (clone from a snapshot + flatten)
          • Any other ideas?

         

         

        • Added configuration for replication and cinder-backup with RBD driver on a cinder test box provided by Jose
          • Will test the integration later this week

         

        • cinder AND cinder-backup support natively to add journaling on images
          • on cinder it also enables replication per volume
            • Is using the same pool for the replicated type worth it? (Pros: probably easier to migrate (to be tested), Cons: not a clear separation between replicated images and regular images)
          • cinder-backup does not enable replication per volume BUT we could enable it on the whole pool

         

        • Neither cinder nor cinder-backup support snapshot mirroring
          • But would not be too hard to add a basic support for it if we need
    • 14:40 14:50
      Ceph: Upstream News 10m
    • 14:50 15:05
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN) , Fabrizio Furano (CERN)
    • 15:05 15:10
      AOB 5m