Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

4
Show room on map
Videoconference Rooms
Ceph_CVMFS_Filer_Service_Meeting
Name
Ceph_CVMFS_Filer_Service_Meeting
Description
Meeting
Extension
10896342
Owner
Dan van der Ster
Auto-join URL
Useful links
Phone numbers
    • 14:00 14:15
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN), Fabrizio Furano (CERN)

      Enrico:

      • unpacked.cern.ch rebuilt from scratch. 2 bugs identified:
      • Bug fixes to come with cvmfs-server-2.7.4
      • When switching to new bucket for unpacked.cern.ch, some clients failed over to other Stratum 1s
        • Event self-cleared within few hours
      • Regular catalog checks (on Saturday) on all S3-based repositories
      • Dedicated caches for SFT (sft, sft-nightlies) on bagplus: enope
      • Pinged Atlas (atlas, atlas-nightlies), lhcbdev, SFT (sft, sft-nightlies) for CC7 and S3 migration
      • CVMFS Talk at https://www.bigscienceorchestration.org/, Friday 28/08, starting at 7am
      Fabrizio:
       
      The new cvmfs probe is running in cvmfs-probe-fdcd727d26.cern.ch -  there have been various iterations with the monitoring team
      • the report was ported to JSON. The older XML format is deprecated now
      • the producer id was changed according to their request
      • we are waiting for them to whitelist it
      • curiously, the monitoring teams' very nice docs do not seem to require a numerical assessment of the availability of a degraded service (which the older cvmfs probe was producing)
      • http://monit-docs.web.cern.ch/monit-docs/ingestion/service_metrics.html#sending-service-indicators

       

       
       
    • 14:15 14:30
      Ceph: Operations 15m
      • Incidents, Requests, Capacity Planning 5m
        Speaker: Dan van der Ster (CERN)
        • Report from CFCCM:
          • SSD cluster (CEPH-921) is looking for a location. I had requested these go in the barn, but there is no physical space.
          • Plan: We will drain one of the 4 racks of our existing beesly osd/critical pool (BA09). This will make room for one quad of "chunk 4" hdd storage (1.2PB total).
          • ETA: Oct 2020.
        • flax osd no_contact: INC2526968
          • AFAICT the machine power button was pressed. I am getting in touch with hw repair service.
        • INC2522809: users observed a block storage issue in a single client IP service. Still not understood.
      • Cluster Upgrades, Migrations 5m
        Speaker: Theofilos Mouratidis (CERN)
        • Levinson is upgraded to v14.2.11 and "fully" in production now as a CephFS cluster. Jose adding it to the quota request form.
        • This week:
          • erin: (confirm with Eric Cano first)
          • nethub (JC)
          • vault (open SSB today for Weds?)
          • beesly (open SSB today for Thurs?)
         
         
      • Hardware Repairs 5m
        Speaker: Julien Collet (CERN)

        Julien

        • 4 drives handled with the repair team (changed label, beesly)
        • bad-osd.sh campaign:
          • a priori, 10 hosts remaining
      • Puppet and Tools 5m
        • New `ceph-weekly-check` cron that runs on each cluster, on the ceph mon leader.
          • Currently two checks: is the ceph balancer enabled? are the osdmaps getting trimmed?
          • Feel free to propose other things to add to the weekly check.
      • JIRA 5m
    • 14:30 14:45
      Ceph: Projects, News, Other 15m
      • Kopano/Dovecot 5m
        Speaker: Dan van der Ster (CERN)
        • Dovecot storage:
          • ceph/kopano cluster upgraded to v14.2.11. (In our puppet manifests, this will remain as "ceph/kopano". However there is an alias for puppet cephfs clients "crit-ssd-a"
          • Current dovecot proto service is using one manila share:
          • [09:44][root@cephadm (production:ceph/admin*0) ~]# ls -l /cephfs-kopano/volumes/_nogroup/3533166e-d6c7-49c3-8bec-75ea8ac4d18f/ 
            total 0
            drwxrwxrwt 1 root root 1 Aug 21 10:20 attachments
            drwxrwxrwt 1 root root 6 Aug 21 16:01 users
          • Dovecot is gzipping and combining mails into 16MB files (mdbox format), and using deduplicated attachment storage ("sis" -- it uses hardlinks).
      • REVA/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
    • 14:45 14:55
      S3 10m
      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (CERN)

      Giuliano

      • ceph/gabe update:
        • testing the update:
          • currently setting a test cluster to test the luminous->14.2.11 upgrade
          • upgrade will be announce so to check osd logs as requested
        • actual update postponed since a bug might be affecting the openstack side

       

      • ceph/nethub update:
        • will be upgraded this afternoon
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN), Theofilos Mouratidis (CERN)
      • In preparation for upgrading flax to nautilus I'm contacting users whose usage exceeds quota.
        • Most are managed by OpenShift -- A. Lossent is taking care of them.
      • Benchmarking CephFS (fuse vs kernel vs ganesha) vs NFS (vanilla) vs EOS
       
    • 15:05 15:10
      AOB 5m