Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

168th ROOT Parallelism, Performance and Programming Model Meeting

Europe/Zurich
32/S-C22 (CERN)

32/S-C22

CERN

17
Show room on map
Marta Czurylo (CERN), Vincenzo Eduardo Padulano (CERN)
Videoconference
ROOT Team Meeting
Zoom Meeting ID
97374667082
Host
Axel Naumann
Alternative hosts
Bertrand Bellenot, Lorenzo Moneta, Danilo Piparo, Enrico Guiraud, Jakob Blomer, Vincenzo Eduardo Padulano
Useful links
Join via phone
Zoom URL

PPP 18.07.2024

Speaker: Ida Caspary
Title: RNTupleTTreeChecker

Discussion

  • Instead of looping the checker should just get all the fields and perform a comparison on the sets to see if any field is missing.

  • Checker run to compare two different RNTuples?

  • Does the checker also check the data?

    • Not yet, the plan is to introduce some data checking. Some ideas so far:
      • Randomly pick values from same entry number and check if they are the same
      • Plot histograms of numerical types and compare
  • Printing all the fields may get out of hand in case of big datasets with hundreds of fields. Maybe we can rethink the way the results of the checks are presented?

  • Are you planning to have a check of identity of all the values of all the fields?

    • We can also add that to the list of data checks.
  • Should the checker also offer a public API to access all the information programmatically? This could help experiments with more complex checking workflows.

    • Yes
  • In some cases, for example when converting TTree data to RNTuple via the RNTupleImporter, some things will change e.g. some data types (Long64_t gets normalised) or parts of the schema itself. These changes still produce a compatible dataset although a strict comparison will report them as mismatches. Maybe we should introduce a report of these "compatible mismatches".

  • We should discuss what the checker should do about complex classes.

    • This needs deep investigation of what info we can get from TTree
  • We should really strive to enable comparisons RNTuple vs RNTuple as well as TTree vs TTree.

    • Maybe comparing page checksums?
    • That does not work in general and opens a can of worms.
    • Also this tool should really give users certainty about the results, so it's better if it does not rely on the page checksums.
  • How can we bring the checker into ROOT?

    • After some discussion with Jakob for now it seems better to avoid having a CLI in ROOT. We should try to add the programmatic API in the ROOT library first.
There are minutes attached to this event. Show them.