174th ROOT Parallelism, Performance and Programming Model Meeting

Europe/Zurich
32/S-C22 (CERN)

32/S-C22

CERN

17
Show room on map
Marta Czurylo (CERN), Vincenzo Eduardo Padulano (CERN)

Giovanni Petrucciani's feedback from CMGRDF

CMGRDF: A framework to define computation graphs built on top of RDataFrame. Tailored towards interactive plotting for complex analysis use cases in CMS.

Q&A

  • How can you derive that different operations require the same files?

    • There is alist of operations to be performed and a list of data sources. By comparing the sets of operators to the list of files we can build a map where if we find files are already being used we can attach to the existing computation graph.
  • What are the numbers in parenthesis at slide showing analysis description?

    • 70 is without optimizing the graph building and just building one graph per selection.
  • Cache based on the input, what kind of input?

    • For each dataset I create a hash from the filename, possibly timestamp. Once the list of operations is present, I include a hash of the whole computation graph.
  • Does this work at every step of the computation graph?

    • No, at the moment it only caches the final results of the whole computation graph to disk.
  • Do you have an idea about the sparseness?

    • We have already prescale and selecting events before starting the analysis. The ~1TB of input dataset thus is pre-skimmed, although there are quite many branches that are left from the original nanoAOD that we don't read.
  • I can't see the JIT stack traces with VTUNE?

    • Try with CLING_PROFILE=1 in the environment
  • Can some of the JITted code be pre-compiled?

    • We already pre-compile some things, not all. I don't know how much we can improve in that sense.
  • What is happening in the building of the graphs that is not RDF JIT?

    • It's the time of the creation of the nodes. Either spent in CMGRDF code or directly within RDF. To be investigated.
  • For snapshot, you write directly to EOS?

    • Yes!
    • Would be nice if you could comment on the github issue.
  • Request multicore workers via condor and run ROOT IMT.

    • This works quite well!
  • gSystem->Load, example of when crashes?

    • I had some examples, not easily reproducible.
    • Would be nice to create one.
There are minutes attached to this event. Show them.
    • 16:00 17:00
      Feedback from CMGRDF and distributed processing 1h
      Speaker: Giovanni Petrucciani (CERN)