Ceph/CVMFS/Filer Service Meeting

600/R-001 (CERN)



Show room on map
    • 1
      Speaker: Enrico Bocchi (CERN)
      • Signing whitelist with YubiKey successful for boss.cern.ch
      • projects.cern.ch is now visible within CERN only
        • Need to tweak the bucket ACL on the S3 side
      • Hardlinks issue with repositories whose release manager has been upgraded to CC7
        • Developers provided a tool to fix this
        • Will apply on one repository and proceed with the others if successful
        • 6 repositories affected. Not blocking.
      • Root full on ams.cern.ch
        • Recently migrated to CC7
        • Root partition used for spooling during the transaction
        • Workaround: Run smaller transactions (or re-create the release manager with bigger flavor)
      • Access to eos project spaces impacted by the migration to new backend
        • Require explicit mount of project-{a..z} in puppet
        • Review the configuration for all the release managers today --> Done
      • Collectd sensors for zfs/zpool working intermittently
        • Investigating...
    • 2
      Ceph Upstream News

      Releases, Tickets, Testing, Board, ...

      Speaker: Dan van der Ster (CERN)

      Ceph Nautilus 14.2.5, release date: TBA (almost ready)

      In this version a possible data corruption bug is fixed that exists in versions 14.2.3 and 14.2.4, OSDs that have separate DB and WAL devices may be subject to checksum erros on the RocksDB.

      We will wait until the version get released to upgrade our clusters that use SSDs.

      Unecessary HEALTH_ERROR message fixed from backfill miscalculations.
      OSDs that have a lot less that 85% usage change their state to backfill_to_full and stop backfilling until a recalculation is done.

      ceph-erin suffered from this due to large backfills.

    • 3
      Ceph Backends & Block Storage

      Cluster upgrades, capacity changes, rebalancing, ...
      News from OpenStack block storage.

      Speaker: Theofilos Mouratidis (National and Kapodistrian University of Athens (GR))

      ceph-erin: 4th rack reformatted, 2 remain

    • 4
      Ceph Disk Management

      OSD Replacements, Liaison with CF, Failure Predictions

      Speaker: Julien Collet (CERN)


      • Still getting tickets from the repair team, even if the KB/guide provided is clear enough
      • We may need to print the course of action they should take in case of an error
    • 5

      Ops, Use-cases (backup, DB), ...

      Speakers: Julien Collet (CERN) , Roberto Valverde Cameselle (Universidad de Oviedo (ES))
    • 6

      Filer Migration, CephFS/Manila, HPC status and plans.

      Speakers: Dan van der Ster (CERN) , Pablo Llopis Sanmillan (CERN)
    • 7
      Speakers: Jose Castro Leon (CERN) , Julien Collet (CERN) , Roberto Valverde Cameselle (Universidad de Oviedo (ES))
    • 8
    • 9


      • Ceph has been automatised enough to reduce the number of tickets created to a manageable amount for the service managers.
      • Tickets created are complex enough to not be handled by the rota people or GNIs that can be handled fast enough.
      • Conclusion: Ceph should not be added to the rota, we may have to reevaluate this in the future