ROOT I/O Meeting
Discussion on VecCore vec serialization:
* need to make sure the I/O understand the layout
* need to allow the conversion to and from different implementation.
* similar to std::array.
* first let’s do a proof of concept (using #ifdef)
* one issue is how to deal with VcVector has an implicit (default) length.
Started to debug the issue of writing classes that have an std::array written column-wise. (writing is fine, reading back via TBranchElement is okay).
std::shared_ptr is next.
Viktor (via email):
1. I validated running spark-root with various notebooks: jupyter/zeppelin/toree, and others to come (both: scala and python)
2. using analytix CERN cluster, spark-root + histogrammar + ROOT works really nicely (with pyspark).
histogrammar aggregates on the executor nodes and allows to parallelize it further. And then I use ROOT for plotting/manipulations with histograms. Plots below are for 1.2TB of public data running 240 parallel processes. Therefore
3. I wrote a few guides on how to use spark-root (scala and python). Also on how to get going with analytix at cern…. it’s all on diana-hep/spark-root git. notebooks are also there
4. CERN IT got access to a cluster in UK from Intel - and we transferred some of this public data - I want to basically run benchmark queries on that….
5. You mentioned that I could provide description of the project on root’s website - it could be good time to start - I would just need the entry point for what you guys do for that?
you mentioned drupal (didn’t google it yet), but basically whatever you guys do that - I just need a starting point…
Brian: distracted this week.
DavidA: Got CMSSW to use hardware based compression. Need to use rpath for security reason.
Could not find a way to enable and disable hardware compression.
First attempt failed. zlib 7th level. even simple example failed. if uncompressing in one go, it works, if doing parts by parts, the unzip report that it does not use the end of the stream. I am trying to understand the problem.
Intel skyline system with hardware zip on motherboard access in April.
Philippe; I recommend to simplify the example by using directly TBufferFile.
DavidA: upstream need tweak to work on clang. also regression on level 6 and 9 due to not inlining some parts (due to having two hash functions).
Before moving on to ROOT, I am awaiting that the patch is good enough to cloudflare (or officially rejected).
Danilo: Did you compare cloudflare vanilla compare to the vanilla zlib?
DavidA: yes a presentation was made a few week ago to the CMS core software meeting.
Brian: Problem is that 10% of the node on the grid do not have some of the assembly instruction used by cloudflare. (and thus the code seg fault on those machine). We have a fork that has a run-time-linking fault back to zlib.
Zhe: I figured out why lz4 was slower than zlib; it was due to using -fPIC when compiling lz4 .. it reduces the performance by a factor 10! Now it looks like that lz4 is 4 times faster.
Philippe: Make sure that you are comparing at the same compression level.
Zhe: Also changing the API used improved the compression speed.
Philippe: Why is -fPIC so much slower? Don’t we need relocatable code?
Zhe: I will check. I am also re-running the test, I should be done by the end of next week.
Philippe: I am working on solving a problem with reading back file written by v5 where the name of the class (collections) was incorrectly calculated by CINT (using ‘relative’ class name as the template argument). Also solving problem with CollectionProxy of bitset and vector<bool>; the collectionProxy first created was unusable but a clone of the proxy works properly (so TTree is fine).