Philippe: I am currently reviewing outstanding merge requests. We have 3 merge request introducing support for LZ4. One of them also introduces support for lzo, zopfli, and brotli. I am concerned that adding this will increase of our maintenance load (follow the change in the libraries, testing, etc). I think we should only add compression algorithm that are significantly better for some (useful) criteria in some real cases. So LZ4 has not yet crossed this threshold; we see some minor improvements in some cases, but nothing significant. See the pull request for more information:
Brian: So far with the CMS files, we see only a very small performance increase when we switch to LZ4. However, I expect this improvement to increase (hopefully significantly) when we start using the bulk I/O in more cases (because the relative time spend in (de)compression will be higher).
Brian: In raw test where zlib and lz4 are compared, lz4 is 10 time faster. But with ROOT it is only modestly faster. 10 or 15% faster.
Philippe: Should we test file with absurdly large buffer to check whether this influence the (de) compression speed?
Brian: Yes, I would like Zhe to look at it.
David A is still about to propose patch(es) to improve performance of zlib to uses vector instructions. He will send patch to ROOT, cloudfare and even zlib itself.
Enrico: In the upcoming enhancement to TDataFrame, we will need the ability to do parallel merge TTrees within a process.
Philippe: Witek has written the necessary code in the context of GeantV and it (still) needs to be introduced in ROOT proper. See
Brian: I will be interested in what we have so far.
Enrico: If the bulk I/O is introduced via a TTreeReader or TTreeReader compatible interface, we will be able to use it easily.
Brian: design done, hoping to have pull request next week.
standalone TTreeReaderFast next week
community white paper the week after
integrate in TTreeReader the week after (2nd week of Feb)
Brian: Still trying to understand internals of TDataFrame. Benchmarking needed to understand which new (improved) I/O parts are needed.
Brian: I think we can be clever in the TTreeProcessor to optimize the I/O access/work. One issue (for the OS I/O) is the fact that each TBB task has it own TFile and its own TTree.
Danilo/Pere: Indeed we could reduce memory/resources by having only one TFile/one TTreeCache.
Philippe: They are trade-offs, in particular we will need to explore new ‘state’ for the TBuffer (for example copying the content of the TTreeCache as is to later decompress it, the downside being that this prevent ‘zero’ copy to decompression engine).
Brian: For the TTreeCache, we could image a new organization of task where we ask TBB to do the loading from file and then the decompression (using one TBB task per basket) immediately and *then* the content is available for the actual worker task.
Viktor: On my side, I can now even read CMS files. Still working through a few small issue on TClonesArray and pointers that are always null.