53rd ROOT Parallelism, Performance and Programming Model Meeting

Europe/Zurich
4/S-030 (CERN)

4/S-030

CERN

30
Show room on map
Danilo Piparo (CERN)

Present: Enric, Guilherme, Enrico, Stephan, Sidong, Javi, Lorenzo, Axel, Stephan, Danilo

Latest Programming Model and Performance Improvements in RooFIt

Stephan presented his recent changes on RooFit, which are about replacing the linked-list implementation of RooFit collections by a new one based on std::vector. By replacing some of the legacy iterators of the RooFit collections he observed a gain of 20-30% in performance with a real example.

The discussion was mainly about how to discourage the use of the legacy (slower) iterators in RooFit. Stephan proposed to use a preprocessor macro to activate compiler deprecation messages and he received the following comments:

- Use plain warning messages instead of deprecation ones (deprecation is for things we are going to remove, and it is not the case)

- Use CMake flag for the messages - problem: we do not want to be flooded with messages

- Do not use "FASTER" or "FUNCTIONS" in the name of the macro, "MODERN" was suggested but voted against too. Perhaps "BETTER"

- Documentation is a key point: we want to advertise that we do not recommend doing certain things by means of documentation. But deprecation in Doxygen, again, should only be for things that will disappear soon.

 

Programming model for model inference with TMVA

Stefan presented a proposal for model inference in TMVA that makes use of the new RTensor class and interoperates with RDataFrame. The final objective here is to be able to inject in an RDataFrame the response of an ML model. RTensor can store the result of that prediction. It is also desirable the interplay between C++ RTensors and Numpy arrays in Python.

Comments:

- Slide 4:

  * Avoid curly braces in the Predict call. 

  * Is the order of the variables obvious? Do we need names assigned to indices? Response: people are used to be aware of the order, you expect to have your data in the right order --- There was no consensus on this matter.

- Slide 7:

  * In the first Define, we need to pass column names (e.g. "muon.pt()" is not valid).

  * If you have many models, it would be nice to get Predict objects for each of them that know about the variables

  * We could place the information of the type and number of variables in the model itself. 

There are minutes attached to this event. Show them.