Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

4
Show room on map
Videoconference Rooms
Ceph_CVMFS_Filer_Service_Meeting
Name
Ceph_CVMFS_Filer_Service_Meeting
Description
Meeting
Extension
10896342
Owner
Dan van der Ster
Auto-join URL
Useful links
Phone numbers
  • Jan Iven presented his plans to add 800TB of data to io1 ceph block storage for AFS Work and Project Spaces. His deadline is currently to decommission 600TB worth of AFS servers by the end of June.
    • Dan explained that this request depends on the full production and rebalancing of the new capacity, a process which will begin this week and will take an unknown number of weeks, to be clarified by next week.
    • Jan will explore with CF if there is any flexibility in the end-June deadline, because some increased time here could allow the Ceph team to more gradually and invisibly rebalance the data to the new hosts.
There are minutes attached to this event. Show them.
    • 14:00 14:15
      CVMFS 15m
      Speaker: Enrico Bocchi (CERN)

      Enrico:

      • Migrations:
        • 3 repos migrated: alice-nightlies, cms-bril, compass-condb
        • 2 migrations tomorrow: atlas-condb, cms-opendata-conddb
        • Under discussions alice-ocdb, lhcb
      • Issues with cvmfs-gateway:
        • s3fanout not making use of parallel connections to S3
        • cvmfs_receiver spin lock
        • cvmfs_receiver SIGABRT when root catalog has more than 200k entries

      Fabrizio + Enrico:

      • Debugging of collectd alarms for failing snapshot on backup machine
      • Ticket to do this systematically on all repos: RQF1574296
       
       
    • 14:15 14:30
      Ceph: Operations 15m
      • Notable Incidents or Requests 5m
      • Repair Service Liaison 5m
        Speaker: Julien Collet (CERN)
      • Backend Cluster Maintenance 5m
        Speaker: Theofilos Mouratidis (CERN)
        • CEPH-888: erin cluster had wrong quota due to PiB PB confusion. Corrected now to usable 80% capacity of the cluster.
        • CEPH-869: Concensus from Linuxsoft, CF, and us is that the RAID-1 efi partition will be OK. Dan opened ticket to CF to contact the vendor, but we should go ahead anyway.
        • discuss crush changes to add osds to beesly...
    • 14:30 14:45
      Ceph: Projects, News, Other 15m
      • Backup 5m
        Speaker: Roberto Valverde Cameselle (CERN)
      • HPC 5m
        Speaker: Dan van der Ster (CERN)
      • Kopano 5m
        Speaker: Dan van der Ster (CERN)
      • Upstream News 5m
        Speaker: Dan van der Ster (CERN)
        • we are still waiting for next nautilus for S3 and CephFS fixes. No ETA at the moment.
    • 14:45 14:55
      S3 10m
      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (CERN)

      Giuliano

      • ceph-886: gabe rgw upgrade completed
      • ceph/nethub: mon_warn_on_slow_ping_ratio set to 0.1
        • drastically reduced the amount of warn email received
      • ceph/gabe adding bare rgw
        • (in progress)
      • S3 accounting:
        • data gathering in place
        • need to test the publishing part

      Enrico:

      • Consul // Nomad upgraded on production cluster (v1.7.2, v0.11.1, respectiveley)
      • Working on migration to traefik 2.2 series
        • 1.7 series supported until end of 2021
       
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN), Theofilos Mouratidis (CERN)
      • CEPH-728: auto-evict hung clients with a non-zero mds_cap_revoke_eviction_timeout
        • This is enabled on ceph dwight, set to 900s.
    • 15:05 15:10
      AOB 5m