Ceph/CVMFS/Filer Service Meeting

31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre


Show room on map
Ceph Daily Standup
Zoom Meeting ID
Dan van der Ster
Useful links
Join via phone
Zoom URL
    • 2:30 PM 2:45 PM
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN), Fabrizio Furano (CERN)
      • Fabrizio 5m
        Speaker: Fabrizio Furano (CERN)
      • Enrico 5m
        Speaker: Enrico Bocchi (CERN)
        • Catalog problem on lhcb-condb.cern.ch persists
        • lhcbdev.cern.ch updated to cvmfs-2.9
          • Showing problems since Saturday -- 2/6 publishers still work
          • Might be a lease database problem on the gateway
          • Fixed at around 14:30 -- Root cause is still under investigation
    • 2:45 PM 3:05 PM
      Ceph Operations Reports 20m
      • Teo (cta, kelly) 5m
        Speaker: Theofilos Mouratidis (CERN)
        • production CTA "object store" migrated successfully to cephcta from cephkelly.
          • This involved a data migration -- originally planning to use rados export | rados import, however there was no way to limit to a few namespaces. So Teo wrote a simple export/import tool custom for CTA objects.
          • There is still some dev/ci activity on cephkelly -- it will be migrated to dwight.
      • Enrico (barn, beesly, gabe, meredith, nethub, ryan, vault) 5m
        Speaker: Enrico Bocchi (CERN)

        Barn, Gabe, Meredith, Ryan: NTR


        • RA* capacity fully out by today :party:
        • CD* capacity fully in (but one host) :party^2:
        • ...and latency is amazing -- see plot


        • All hosts rebooted to apply `write_through`
        • Latency is *very* low: ~5ms
        • Would be nice to confirm impact on AFS -- haven't heard back yet


        • New capacity in HA08 installed, public IPs, filled with data
          • Same trick of splitting rack into two (or weight would be too high)
          • Cluster is now ~60% full (rgw.buckets.data)
        • (Re-)starting draining of HA07 and HA06
          • Should fit into new HW -- tbc
          • Will continue over xmas (slowly) to replace HW very beginning 2022
        • Selinux issue fixed by reinstalling pcp-selinux (and reboot)
      • Dan (dwight, flax, kopano, jim, upstream) 5m
        Speaker: Dan van der Ster (CERN)

      • Arthur (levinson, pam) 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
        • Monit will fix their grafana API!
          • We will use their dev instance as QA for our dashboards as well (and to preview change on MR in ceph-monit!
        • Working on a terraform presentation at the next section meeting
      • Jose (OpenStack) 5m
        Speaker: Jose Castro Leon (CERN)
      • Robert (Kubernetes) 5m
        Speaker: Robert Vasek (CERN)
        • Working on csi for snapshots.
    • 3:05 PM 3:15 PM
      R&D Projects Reports 10m
      • Monitoring NG 5m
        Speaker: Aswin Toni (CERN)


        • Openstack exporter integration
        • WIP:
          • Get tenant names from uuid
          • New dashboard to view a particular tenants activity
          • More extensive tenant information
          • Top n or avg or both?
          • It's a bit slow
          • Missing information?
        • Making sense of the data
          • What would be useful to display
          • Verify data accuracy
      • EOS CephFS Test 5m
        Speaker: Roberto Valverde Cameselle (CERN)
      • Reva/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
      • Disaster Recovery 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
        • Still working on rbd-mirror
        • Discussion with the network team
          • Need to have some servers (rgw+route server or reflectors) that they can plug into their lab
            • 8 servers or vm total to fully test that
          • Will prepare the configuration needed int the meantime and once we have the servers ready we will ping them back to test a pilot

        • PreCC proof-of-concept in 2022 -- documents shared. Tim estimated 5d to deploy the clusters and 15d to setup multi-region. (Not including any r&d work, not including maglev, not any subsequent testing -- just the "setup" time).
          • Total is 100 person-days to setup the Indico DR/BC PoC.
    • 3:15 PM 3:20 PM
      AOB 5m
      • Cephalocon abstract
      • xmas communication channels:
        • Mattermost: {Ceph, CVMFS, Filer} Internal
        • Mattermost dept-wide: BestOverXmas and DownForEveryoneOrJustMe
        • Mattermost topical: atlas-s3, inspire-s3/cephfs, LHCb cvmfs, malt mail infra, mic filers, gitlab storage, ...
        • Whatsapp: Cephalopods, GSS
        • Telegram: Storage Crisis @CERN