(Georgios)

Request: We need more space for PPS, we're at 97% capacity on eospps-ns* (already with compression).
- Could we add a second SSD, so that we can test rocksdb with multiple data directories?
  - Luca: won't happen - complicated workflow to add HW.
  - Dan: servers would already have 2 SSDs, perhaps put OS on spinning disk.
  - Elvin: would like to verify that RocksDB can actually split over several devices
  - Massimo: take offline, will try to get test machine
- Could we move away from VMs for the NS machines? Virtualization severely limits amount of IOPS we can get from SSDs. (Only a couple of thousand, real SSD can do ~100k, real disk ~300 - all seen both with QuarkDB and "fio" synthetic test)
  - Luca: please benchmark the machine he prepared
  - Dan: (->offline) please talk to Arne, he might be investigating such limits.
Two weeks ago, PPS stopped working upon reaching 2^32 files. There was a type confusion (id_t) in the code, making file and container IDs which should normally be 64 bits, get truncated to 32 bits.
- Safety checks detected there was something wrong, and the NS was refusing to boot.
- Elvin: also would affect aquamarine. IDs get increased, never re-used - will this affect EOSUSER? Only if somebody creates+deletes files in a loop.
- Jan: could this affect FUSE clients? yes in principle, but Georgios did check, nothing found
Experimenting with optimizing how metadata is laid out on disk, to reduce the high amount of IOPS incurred when listing a directory.
- So that, file metadata within the same directory are physically colocated on disk.
- Listings would thus benefit greatly from kernel page cache, and rocksdb block cache.
- Jan: worried about intrusive changes close to production roll-out? Would probably be last major change (and would need a conversion campaign on EOSPPS, prod instances will have this from the start).

(Andreas)

Changed the behaviour of atomic uploads to avoid any file loss scenario by overlapping/replaid open/uploads EOS-2571
Fixed looping bug in FST (MgmSyncer.cc) using a bogus mtime making a
thread running in a tight loop
( filling up /var parition in EOSUSER/GENOME )
fix in CITRINE & AQUAMARINE branches

(Massimo)

Memleaks in (old) namespace, linked to file creation - seems to also affect new NS (QuarkDB starts paging. Other issues: every hour, on the hour, see mem increase - log truncation? Last issue is runaway mem consumptiom

(Jan): new NS roll-out - status (EOSHOME has 1 filesystem)?

have Foreman hostgroup, puppet config (MGM and QuarkDB on same box), have redirector, have NS for first instance (but will wait for new QuarkDB on-disk layout)