167th ROOT Parallelism, Performance and Programming Model Meeting

Europe/Zurich
32/S-C22 (CERN)

32/S-C22

CERN

17
Show room on map
Marta Czurylo (CERN), Vincenzo Eduardo Padulano (CERN)
Videoconference
ROOT Team Meeting
Zoom Meeting ID
97374667082
Host
Axel Naumann
Alternative hosts
Bertrand Bellenot, Lorenzo Moneta, Danilo Piparo, Enrico Guiraud, Jakob Blomer, Vincenzo Eduardo Padulano
Useful links
Join via phone
Zoom URL

PPP 23.05.2024

The CROWN framework at KIT

Framework to convert NanoAOD to analysis ntuples.

  • Focused on efficiency and fast processing with minimal dependencies.

  • Having an NTuple framework allows to produce a clearly defined state of selections and corrections for many people who might need it

  • Also runs expensive calculations of derived quantities once

  • Used at KIT by the CMS group: O(10) analysis

  • Python configuration, auto generation of C++ program

Automatic code generation

The python classes will generate C++ code for RDataFrame (with no JITting). There is a general guideline to keep these automatically generated functions around for use by multiple analyses, trying to keep them as simple as possible.

Q&A

Q: Why is reading from remote so close to reading from local in the performance measurements you show?
A: The remote reads are quite close to the compute nodes
Comment: Alright, but this still shows that with a good enough network remote reads are not providing an overhead, this is good.

Q: The more RDF improves, the better for CROWN. What can we do as ROOT team to support you even more?
A: Overall performance is good. The one thing I'm not so happy about is with Snapshot. We are required to use JIT there. You can template Snapshot, but it does not scale to our size of the RDataFrame. In principle we can know all the data types of our quantities. We would have this information, but in the current way it's just not possible for us to use it.
Q: How many template parameters?
A: In an example I show there are roughly 14K outputs.

Q: You talked about a second step where you have all the histograms produced for the analysis. How do you handle this and can we support you even more?
A: Also this is ROOT-based. It was a master student who also worked (Max Galli) who developed this Histogram framework that we are still using today.

Q: You said you would try to avoid JITting but RDF is doing it under the hood. But that's not what you mean, RDF still JITs something. Are you fine with it?
A: I believe we are doing the best on our side

Q: About the friend tree support, are they always aligned?
A: We ensure in the way that friend trees are always used single core and built upon CROWN ntuples. For the ntuples, we have an internal status bit to add friend if something was produced in MT we reset it by hand for the main analysis ntuple. 1 input file, 1 friend file.
Q: Then you also forbid building a friend where the number of events don't match?
A: Yes indeed.
Q: If you have multiple input trees, do they always come from the same upstream nanoAOD source?
A: Not necessarily but they come from the same MC production campaign.
Q: How do you guarantee the alignment?
A: You mean basically you produce friends and then a new CROWN ntuple with a different nanoAOD input.
Q: Can you befriend those?
A: No we do not support. It's always friends have to match the CROWN ntuple.
Q: Is there a machinery to ensure this?
A: The workflow manager ensures this but people can do whatever actually

Q: You mentioned dask and more modern approaches, RDF can autonomously run in a distributed fashion once dask workers are instantiated inside HTCondor jobs. Is this something that interests you?
A: I guess in our case it's a different situation. At our institute we have a large HTCondor system which is distributed over multiple sites. Basically this is the infrastructure we are bound to, in this case it made the most sense for us to implement it this way. We have a PhD student working on setting up a dask cluster similar to SWAN with HTCondor backend.

There are minutes attached to this event. Show them.
    • 16:00 17:00
      The CROWN Framework at KIT-CMS 1h
      Speaker: Sebastian Brommer (KIT - Karlsruhe Institute of Technology (DE))