Current setup:
Ceph cluster configured with Ceph Ansible using Mimic 13.2.6 (dashboard isn’t as pretty as Nautilus!)
8 disk nodes with 96GB memory, 12CPU cores and 20 OSD each. Each disk is 10TB.
Currently have 8 Mons, one per storage node. RAL has 5 monitors for Echo and 3 monitors for every other cluster. Separate nodes that are relatively static compared to the storage nodes. For a cluster Glasgow’s size the recommendation is for 3 Mons.
CRUSH hierarchy is flat and hasn’t been changed from the default (Host).
Erasure 8 + 2 resilience.
Pool will be created called atlas 1024 PGs. The LibradosStriper settings will be the same as RAL. Sam tested and found that the sweet spot was around 8MB objects.
2 additional nodes with SSD to be used as XRootD proxy gateway.
Further improvements:
Move Mons onto Gateway nodes.
Best performance for Mons is to put RocksDB on an SSD.
Other:
For Upgrades you want to do Mons before OSDs. If you put Mons on host with OSD unless you are careful you could get some
With libradosStriper, if you lose one PG, you should assume you have lost the cluster. There will be small holes in most files.
Gareth said that the important thing is to demonstrate that this setup works well enough to fund an expansion. Getting the resilience configuration right just now is not essential.
There are minutes attached to this event.
Show them.