Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

4
Show room on map
Videoconference Rooms
Ceph_CVMFS_Filer_Service_Meeting
Name
Ceph_CVMFS_Filer_Service_Meeting
Description
Meeting
Extension
10896342
Owner
Dan van der Ster
Auto-join URL
Useful links
Phone numbers
    • 14:00 14:15
      CVMFS 15m
      Speaker: Enrico Bocchi (CERN)

      Last week:

      • New IT CVMFS - LCG project:
        • zero machines migrated
          • zero0{1..3} in LCG
          • zero04 in GPN
        • front machines have a performance issue with new frontier-squid 4.4.10
      • New repositories granted, CC7 + S3:
        • atlas-pixel-daq.cern.ch
        • sw-nightlies.hsf.org
      • Issue with lxcvmfs64 out of network: OTG0055206
        • RM for atlas-condb.cern.ch
        • Root cause: Live migration to hypervisor on a different subnet
        • Action: Moved volume to lxcvmfs54
        • The repo should be migrated to CC7 + S3 (see below)
      • Issue with wlcg-squid-monitor.cern.ch
        • German date in cvmfs_status.json file
          {
          "last_gc": "Son Mär  8 17:34:18 UTC 2020"
          }
        • unpacked.cern.ch last GC > 10 days
      • Sent around migration to CC7 + S3 message:
        • deadline is end June 2020
        • atlas-condb, lhcb, alice, alice-ocdb, cms-opendata-conddb

      Upcoming:

      • Remove wlcg-clouds
      • Migrate grid.cern.ch to CC7 + S3
      • Cache split in bagplus:
        • angel - ca-proxy-atlas: atlas, atlas-condb, atlas-nightlies, atlas-online-nightlies, atlas-pixel-daq
        • banded - ca-proxy-alice: alice, alice-nightlies, alice-ocdb
        • bigfin - ca-proxy-lhcb: lhcb, lhcb-condb, lhcbdev
        • cmsmeyproxy.cern.ch (Dave Dykstra): cms, cms-bril, cms-ci, cms-ib, cms-opendata-conddb
    • 14:15 14:30
      Ceph: Operations 15m
      • Notable Incidents or Requests 5m

        (From Dan)

        • ceph/beesly: very slow to add a new mon: issue is understood now -- it is because synchronize a new mon tries to query 600k keys from the rocksdb all at once. Upstream has a fix to limit to 2000 keys per chunk -- I put a workaround (4kB per chunk).
        • Several network interventions planned in March: https://cern.service-now.com/service-portal/ssb.do?tab=interventions
      • Repair Service Liaison 5m
        Speaker: Julien Collet (CERN)

        Giuliano

        • Notified the repair team that they can go ahead with existing/upcoming disk replacements on ceph/beesly
      • Backend Cluster Maintenance 5m
        Speaker: Theofilos Mouratidis (CERN)

        Theo:

        Implementing upmap feature for the balancer module:

        • Finished
        • Needs testing

        Thesis benchmarks done, we can remove the machine at the end of the month

    • 14:30 14:45
      Ceph: Projects, News, Other 15m
      • Backup 5m
        Speaker: Roberto Valverde Cameselle (CERN)
      • HPC 5m
        Speaker: Dan van der Ster (CERN)

        HPC Users Workshop https://indico.cern.ch/e/HPC2020

      • Kopano 5m
        Speaker: Dan van der Ster (CERN)
      • Upstream News 5m
        Speaker: Dan van der Ster (CERN)
    • 14:45 14:55
      S3 10m
      Speakers: Julien Collet (CERN), Roberto Valverde Cameselle (CERN)

      S3 is 70% full: https://filer-carbon.cern.ch/grafana/d/000000001/ceph-dashboard?orgId=1&refresh=30s&var-cluster=gabe&fullscreen&panelId=157&from=now-90d&to=now

       

      Giuliano

      • ceph/nethub updated to 14.2.8
      • s3-fr-prevessin-1.cern.ch alias to 3 rgws
      • s3 cluster ready for testing (w/Roberto)
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN), Theofilos Mouratidis (CERN)

      Last filers need to be decommissioned by end March: https://its.cern.ch/jira/projects/FILER/issues/FILER-120?filter=allopenissues

       

      Wojciech hit the same ZFS volume extension bug that we saw on Filer. Someone can try to reproduce/find a new procedure?

       

      Theo:

      Created itnfs23c for twiki

      Created new volume for Argus, Ben Jones has access to that volume, now he can copy the data

    • 15:05 15:10
      AOB 5m