Background: Filestore stores objects in XFS, and creates subdirs onces the number of objects in a directory reaches some threshold. This "splitting" adds some small latency whenever it is triggered, so in the past we have worked around this by raising the threshold. Now the threshold is so high, that when the split is triggered it causes a hang of 10s of seconds.
I have started a campaign to split the PGs into smaller directories -- this is done offline, while the OSD is stopped, to prevent any slow requests.
Should be done by end of this week, after which we can resume balancing/etc...
Ceph Disk Management5m
OSD Replacements, Liaison with CF, Failure Predictions