Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

ROOT I/O Meeting

Europe/Zurich
32/1-A24 (CERN)

32/1-A24

CERN

40
Show room on map
Brian Paul Bockelman (University of Nebraska-Lincoln (US)), Philippe Canal (Fermi National Accelerator Lab. (US))

Attendees: Danilo, Jakob, Peter, Oksana, Guilherme, Brian.

Peter: As part of the CCE, Taylor is planning a work for parallel I/O, i.e. I/O and very high core count machine.

Danilo: A thématique CERN school of Computing at the beginning of June, so watch out for overlap.

Danilo: Going over deficiencies.  Today with Enrico we fixed a bug in TDataFrame to handle CMS nano-AOD, PR#1532.  We have the problem with inferring the type of a given branch, in particular for array.

Philippe: Look at GetLenStatic and GetLeafCount.   GetNdata will return the current number of element (GetLenStatic() * variable_number_of_elements for the last read entry).

Oksana: I wanted to show the slide I made but I need to recheck a few things.  I finally managed to have a good version of zlib/cloudfare.  It is ready to go.  I made some measurement.  There is an improvement from 101 and 106 compression level of 30% speed improvement on x86.  There is two option for arch64 support, neon or direct intrinsics.    The rate were improved significantly  there now similar to lz4 (but lz4 is not really good on that platform).  Can we merge it?

Oksana: comparison between 1.2.8 and 1.2.11 of zlib shows no performance improvement and the cloud flare is still 1.2.8.  Only bug fixes (actually fixes for problem introduced in 1.2.9) and decoration.  So I think we should stay at 1.2.8

Philippe: Let’s keep it at 1.2.8 and I will look at PR#1527.

Oksana: Fons provided files from Genetics frameworks contrasting lzma, zip and lz4.  9Gb with lzma 1.  18Gb with lz4.   If you do a Maps you see first a big gap and then baskets seems empty.  I still digging around to understand what’s going on.

Brian: It looks like some of the strangeness looks related to how this file was written.  In some case the compression result is very different between lzma and lz4 for the ‘same’ basket (3 order of magnitude better compression factor).  This seems very odd and may indicates that in fact the file/data are different and/or the settings are very different.

Jim:  I have been working on generalization of relationship between arrays in tbasket and object in programming environment.  I have gone deeper in how Parquet files are working.  I remember Jakob presentation comparing ROOT and Parquet.  The selective read was faster in ROOT but was not faster in Parquet which sounded odd as the format as semantically similar.  In looking deeper, I realized that the file was badly configured, with the buffer size being 1000 times smaller than recommended by Parquet (documentation).   A more in-depth comparison of Parquet and ROOT might be a good talk to the ROOT I/O workshop, making sure to compare apples to apples.  I am in a better position to express the advantage and disadvantage both qualitatively and quantitavely.

Jakob: I am also very interested.   For my presentation, I did not make any effort to tune the other format to get the best performance out of them.  For Parquet, I used the sample application as a pattern.

Jim: In the sample, the data set is small and the buffer size is likely to have been set small so you could still see a second partition.

Jakob: Looking at tutorial application/code were you see the usage of the core function of TTree to understand better how they need to be update for the v7 iterator-or-range based interface.

Guilherme: Still recovering from being away.  Plan for I/O in the BufferMerger is to move it from thread based to task based.

Brian:  Bulk I/O presentation no sooner than February the 5th.

Philippe:  Next ROOT I/O Workshop on February 21st.  Jim’s talk and Oksana’s talk.  

 

There are minutes attached to this event. Show them.
    • 16:00 16:20
      Round Table 20m