168th ROOT Parallelism, Performance and Programming Model Meeting

Name: 168th ROOT Parallelism, Performance and Programming Model Meeting
Start: 2024-07-18T16:00:00+02:00
End: 2024-07-18T17:30:00+02:00
Location: CERN

Thursday 18 Jul 2024, 16:00 → 17:30 Europe/Zurich

32/S-C22 (CERN)

32/S-C22

CERN

Show room on map

Marta Czurylo (CERN), Vincenzo Eduardo Padulano (CERN)

Videoconference

ROOT Team Meeting

Zoom Meeting ID: 97374667082
Host: Axel Naumann
Alternative hosts: Bertrand Bellenot, Lorenzo Moneta, Danilo Piparo, Enrico Guiraud, Jakob Blomer, Vincenzo Eduardo Padulano
Useful links: Join via phone
Zoom URL

Hide

PPP 18.07.2024

Speaker: Ida Caspary
Title: RNTupleTTreeChecker

Discussion

Instead of looping the checker should just get all the fields and perform a comparison on the sets to see if any field is missing.
Checker run to compare two different RNTuples?
Does the checker also check the data?
- Not yet, the plan is to introduce some data checking. Some ideas so far:
  - Randomly pick values from same entry number and check if they are the same
  - Plot histograms of numerical types and compare
Printing all the fields may get out of hand in case of big datasets with hundreds of fields. Maybe we can rethink the way the results of the checks are presented?
Are you planning to have a check of identity of all the values of all the fields?
- We can also add that to the list of data checks.
Should the checker also offer a public API to access all the information programmatically? This could help experiments with more complex checking workflows.
- Yes
In some cases, for example when converting TTree data to RNTuple via the RNTupleImporter, some things will change e.g. some data types (Long64_t gets normalised) or parts of the schema itself. These changes still produce a compatible dataset although a strict comparison will report them as mismatches. Maybe we should introduce a report of these "compatible mismatches".
We should discuss what the checker should do about complex classes.
- This needs deep investigation of what info we can get from TTree
We should really strive to enable comparisons RNTuple vs RNTuple as well as TTree vs TTree.
- Maybe comparing page checksums?
- That does not work in general and opens a can of worms.
- Also this tool should really give users certainty about the results, so it's better if it does not rely on the page checksums.
How can we bring the checker into ROOT?
- After some discussion with Jakob for now it seems better to avoid having a CLI in ROOT. We should try to add the programmatic API in the ROOT library first.

There are minutes attached to this event. Show them.

- 1
  
  CLI utility to validate RNTuple data
  
  Speaker: Ida Friederike Caspary (CERN)
  
  RNTupleTTreeChecker.pdf
  
  RNTupleTTreeChecker.pptx

Choose timezone

168th ROOT Parallelism, Performance and Programming Model Meeting

32/S-C22

CERN

PPP 18.07.2024

Discussion