Attendees: Brian, Oksana, Amadio, Peter, Zhe, Danilo, Guilherme, Muhammad Alhroob.
Prensentation by Oksana:
Brian: couple of question raised. Is it worth while to put Zstd into ROOT?
Peter: We would be interested in trying this out.
Brian: We are going to look at the LHCb tuple and CMS nanoAOD file and a few of the different yardstick.
Philippe: latest QAT result, what do you think?
Brian: Looking fine for trigger/daq but not for our use case (still the problem of superuser requirement, etc.).
GSoC proposal from Giulio: Efficient storage of ROOT files in a git repository
Physicists have often the need to store ROOT files inside a git repository, mostly to be able to have them versioned and for the convenience of having them shipped together with the source.
However given the fact that git treats files as an atomic units and due to the common "compressed blob" behavior of ROOT files, this can result in extremely large git repository because git is unable to further compress files, even if they differ very little one from the other.
The aim of the project is to provide a tool to convert ROOT files in a form which is suitable for being stored in a git repository using the object store model of git as a way to store separate, uncompressed TKeys. Under the assumption people tend to store in a git repository ROOT files which actually have minor changes between them, this should allow git to efficiently group and compress similar entries, possibly resulting in much more compact repositories.
The project is divided in four parts:
Have an helper function which allows to stream ROOT files to a git blob store.
Have an helper function which allows creating a ROOT file from a git blob store.
Profile the speed required for doing the above and the scalability of the storage in terms of size.
Optional: provide a native API which allows to retrieve deserialised object, rather than ROOT files.
what do you think?
Brian: We should definitely tie this in to some of the open problem that we do have. I.e. what else could benefit from this. Let’s not make this a standalone thing that might eventually die-off.
Zhe: I have two things on my plate: Signal handler MacOS and parallel unzip. I should have some time to work on this on the next few weeks.
Philippe: Parallel I/O, i.e. I/O on HPC workshop wil be held in August.
Danilo: 3 to 9 of June is a Summer School of Computing.
Guilherme: We ought to improve the file merger test to be less dependent on the data content. For example we could remove the histograms from the files.