Ceph/CVMFS/Filer Service Meeting

600/R-001 (CERN)



Show room on map
    • 2:00 PM 2:15 PM
      CVMFS 15m
      Speaker: Enrico Bocchi (CERN)

      Last week:

      • New IT CVMFS - LCG project:
        • zero machines migrated
          • zero0{1..3} in LCG
          • zero04 in GPN
        • front machines have a performance issue with new frontier-squid 4.4.10
      • New repositories granted, CC7 + S3:
        • atlas-pixel-daq.cern.ch
        • sw-nightlies.hsf.org
      • Issue with lxcvmfs64 out of network: OTG0055206
        • RM for atlas-condb.cern.ch
        • Root cause: Live migration to hypervisor on a different subnet
        • Action: Moved volume to lxcvmfs54
        • The repo should be migrated to CC7 + S3 (see below)
      • Issue with wlcg-squid-monitor.cern.ch
        • German date in cvmfs_status.json file
          "last_gc": "Son Mär  8 17:34:18 UTC 2020"
        • unpacked.cern.ch last GC > 10 days
      • Sent around migration to CC7 + S3 message:
        • deadline is end June 2020
        • atlas-condb, lhcb, alice, alice-ocdb, cms-opendata-conddb


      • Remove wlcg-clouds
      • Migrate grid.cern.ch to CC7 + S3
      • Cache split in bagplus:
        • angel - ca-proxy-atlas: atlas, atlas-condb, atlas-nightlies, atlas-online-nightlies, atlas-pixel-daq
        • banded - ca-proxy-alice: alice, alice-nightlies, alice-ocdb
        • bigfin - ca-proxy-lhcb: lhcb, lhcb-condb, lhcbdev
        • cmsmeyproxy.cern.ch (Dave Dykstra): cms, cms-bril, cms-ci, cms-ib, cms-opendata-conddb
    • 2:15 PM 2:30 PM
      Ceph: Operations 15m
      • Notable Incidents or Requests 5m

        (From Dan)

        • ceph/beesly: very slow to add a new mon: issue is understood now -- it is because synchronize a new mon tries to query 600k keys from the rocksdb all at once. Upstream has a fix to limit to 2000 keys per chunk -- I put a workaround (4kB per chunk).
        • Several network interventions planned in March: https://cern.service-now.com/service-portal/ssb.do?tab=interventions
      • Repair Service Liaison 5m
        Speaker: Julien Collet (CERN)


        • Notified the repair team that they can go ahead with existing/upcoming disk replacements on ceph/beesly
      • Backend Cluster Maintenance 5m
        Speaker: Theofilos Mouratidis (CERN)


        Implementing upmap feature for the balancer module:

        • Finished
        • Needs testing

        Thesis benchmarks done, we can remove the machine at the end of the month

    • 2:30 PM 2:45 PM
      Ceph: Projects, News, Other 15m
      • Backup 5m
        Speaker: Roberto Valverde Cameselle (CERN)
      • HPC 5m
        Speaker: Dan van der Ster (CERN)

        HPC Users Workshop https://indico.cern.ch/e/HPC2020

      • Kopano 5m
        Speaker: Dan van der Ster (CERN)
      • Upstream News 5m
        Speaker: Dan van der Ster (CERN)
    • 2:45 PM 2:55 PM
      S3 10m
      Speakers: Julien Collet (CERN) , Roberto Valverde Cameselle (CERN)

      S3 is 70% full: https://filer-carbon.cern.ch/grafana/d/000000001/ceph-dashboard?orgId=1&refresh=30s&var-cluster=gabe&fullscreen&panelId=157&from=now-90d&to=now



      • ceph/nethub updated to 14.2.8
      • s3-fr-prevessin-1.cern.ch alias to 3 rgws
      • s3 cluster ready for testing (w/Roberto)
    • 2:55 PM 3:05 PM
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN) , Theofilos Mouratidis (CERN)

      Last filers need to be decommissioned by end March: https://its.cern.ch/jira/projects/FILER/issues/FILER-120?filter=allopenissues


      Wojciech hit the same ZFS volume extension bug that we saw on Filer. Someone can try to reproduce/find a new procedure?



      Created itnfs23c for twiki

      Created new volume for Argus, Ben Jones has access to that volume, now he can copy the data

    • 3:05 PM 3:10 PM
      AOB 5m