- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Attendees: Danilo, David A, Guilherme A, Jim P, Pere, Philippe Vassil, Zhe.
David A: Continued to work on the Intel on-chip compression technology. I was able to make it work in a simple example (TTree with a simple branch). Write: 1m54s 200Mb with hardware compression: 20s 230Mb. Reading: hardware is slower than software. The problem is the way we do the decompression on the ROOT side. ROOT ask to go the end of the stream but the hardware does not and thus need to loop. 9 bytes lefts/overhead introduced by ROOT for meta-data. Issues with dealing with the last few bytes. [This no-get-to-the-end of the stream requires a ROOT patch]. I also have access to a ARM with hardware compression. However there is a mismatch between the hardware buffer size and the basket sizes. In one of the example 2.7M out of 6M were compressing a simple byte. So I added a threshold when not even attempt compression. So question on how to do a large scale testing. One framework called LTTNG tracing framework; I added tracepoint in ROOT (per user/per process/per cpu), can customize output via python. Faster than printf :) . For example. I added a test for “do we compress the same exact data multiple time”. So I am analyzing the pattern of behavior and usage of compression. Max size is 16Mb but majority of compression is for small data. Tried to hook it in CMSSW but got in some problem. After a few fixes in the CMS code, I was able to run 100 events but still some instability.
DavidA: Still need to push a few more extra patches.
Zhe: First version to use TBB to paralyze basket decompression. I have two issues. (a) I used malloc and I tried to replace with TBB allocator but see crashes. (b) If there is a TTreeCache miss, then try to immediate schedule the decompression on the main thread which is then block until it is all done. I directly use a task_group.
Danilo: We have managed to move all direct usage of TBB to use the TThreadExecutor (including for the parallel unzipping). You may want to start using it. I can help you further.
Danilo: For the allocator you may want to look at tcmalloc and jemalloc.
Philippe: Couldn’t you pre-allocate the memory?
Zhe: I will take a look at this indeed.
Vassil: According to my plan, it would be the last part of the year but I am thinking about summer instead.
Jim: RAS
Danilo: I need to finish work on DataFrame before tackling shared_ptr.
Danilo: We are currently working on integrating JITing in DataFrame. Then the ability to check-point the state of the TDataFrame into a file/tree.
Guilherme: VecCore merge request is almost ready. There is an issue with failure when built-all is enabled. With VecGeom and ninja, due to missing dependency it build things out of order.
Vassil: I may have seen a solution to this problem (using a global list of sort).
Guilherme: We should really think about the I/O use case for VecCore::Real_v has they might have different length on different platforms.