Ceph/CVMFS/Filer Service Meeting

600/R-001 (CERN)



Show room on map
    • 14:00 14:15
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN) , Fabrizio Furano (CERN)


      • Several catalog errors in unpacked.cern.ch and project.cern.ch
        • Reported to developers, now implementing an extra command to fix it
      • Created machines and aliases for caches dedicated to sft repos (2)
        • Will create alias for compass repositories (3) but not dedicated caches
        • Once done, will ask CM to push to QA by CM
      • IPv6 fixes on Stratum1 caches thanks to Fabrizio's probe
    • 14:15 14:30
      Ceph: Operations 15m
      • Cluster Upgrades, Migrations 5m
        Speaker: Theofilos Mouratidis (CERN)
      • Hardware Repairs 5m
        Speaker: Julien Collet (CERN)


        • ceph-disk-management:
          • Found a bug on the script when run on the ceph/vault cluster
          • Now fixed
          • Longer-term:
            • use ceph-volume lvm batch is super convenient even in non-batch mode
            • could simplify and robustify the scripts.


      • Incidents, Requests, Capacity Planning 5m
        Speaker: Dan van der Ster (CERN)
      • Puppet and Tools 5m
    • 14:30 14:45
      Ceph: Projects, News, Other 15m
      • Kopano/Dovecot 5m
        Speaker: Dan van der Ster (CERN)
      • REVA/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
    • 14:45 14:55
      S3 10m
      Speakers: Julien Collet (CERN) , Roberto Valverde Cameselle (CERN)


      • Inspire backup (HEPData) from CephFS to S3. CephFS is main storage and plan to backup to S3 (requirement is at least one copy readable)
      • rgwxl-bare complaining now and then (roger disabled, /usr/bin/radosgw kill)
      • S3 sync modules
        • Managed to set up metadata sync to new region using elastic search
        • Trying to come up with a simple way to test and query this metadata
      • Request from a user (RQF1623111)
        • 1-2TB on S3 for backup of persistent volumes used by kubernetes
        • intent to use this as a backup for disaster recovery
        • would use restic
      • S3 Accounting
        • PR with most of the puppet config for ceph/accounting
        • Only item left to fix is eos-project access
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN) , Theofilos Mouratidis (CERN)


      • Space exhaustion on itopenshift-nfs02-reg(istry):
        • Using 3.12 TB out of 3.8 TB. Volume is 4 TB.
        • Cannot easily expand due to bug in zfs with cc7 that makes impossible to use new capacity
        • Created itnfs08d with 8TB volume and copied over with Syncoid
      • GitLab slow or unresponsive (OTG0058058)
        • Peak of reads on gitlab-prod-storage3
        • Caused by peaks of memory usage in one GitLab backend server that triggers OOM kill and, after, files from the filer need to be cached again
      • Intervention on routers (OTG0057829) impacted filers
        • All smooth, no manual intervention required
        • There was a short (5 min) network disruption at 13:00. Also OpenStack was unavailable.
    • 15:05 15:10
      AOB 5m
      • Scheduled power cut, Thursday Jul 30, 6:00 - 6:15