EP R&D Software Working Group Meeting
Software R&D Working Meeting Minutes
2025-01-15
Room: André, Vincenzo, Florine, Severin, Joana, Graeme, Lukas, Danilo
Remote: Aaron, Swathi, Peter, Felice, Wahid, Pere
Apologies: Juan C., Pere,
News
Graeme: Do we have to reupload the paper when getting the EPRDET DOI?
Swathi: I had to resubmit the paper to arxiv.
Graeme: Sometimes you can just reserve the DOI, to keep things consistent. This is useful to write it directly in the paper.
Danilo: Is May 7th in the same week as the WLCG workshop?
Graeme: Yes, also same week as AidaInnova last meeting.
Efficient analysis update - Florine
Graeme: How big is the hash?
Florine: Currently unsigned 64bit integer.
Graeme: chances of collision goes as the square root, we have a lot of events.
Florine: this is something to be investigated. We have a naive implementation at the moment, also for building the join tables there are techniques we can use such as the partition join. This requires further benchmarking and understanding.
Vincenzo: we can always cross check by reading the join column values in the right-hand-side dataset
Severin: Did you try the case of joining on a file-by-file basis?
Florine: This still requires some performance tuning, we don't have concrete numbers yet.
Severin: What is the bottleneck for the creation of the join table?
Florine: You need to iterate over all files. One bottleneck is memory. Another is the lookup, in principle if we get a lot of collision we get performance penalties by having to cross check.
Andrè: Can ROOT have 1-2 weeks of warnings when moving interfaces out of Experimental?
Danilo: In principle yes, to be seen how we do it.
Graeme: A wrapper could raise a warning, then still call the experimental interface. Only for a while
Andrè: In the dataset specification, can we use wildcards for the list of files?
Vincenzo: currently this is possible because the specification will forward to TChain interface, which allows it.
Florine: In the future we could also add this for RNTuple, it might incur in performance cost.
Graeme: Have you thought about integration with database management systems? In large distributed jobs people think in terms of dataset names rather than list of files.
Florine: It's tricky because it can be very specific.
Vincenzo: We could make sure that our building blocks interface well with what experiments need, e.g. RUCIO creating a dataset specification easily.
Core libraries update - Aaron
Pere: What is the strategy on the ROOT side? I understand cppyy can be based on this, it also implements most of cling's functionality today. Is ROOT going to use this library to reimplement cling? But also using the latest version of cppyy? Or embed cppyy in ROOT?
Aaron: I should clarify that cppinterop does not reimplement cling, it is always built on top of an interpreter such as cling or clang-repl. We need a C++ interpreter in order for cppinterop to work.
Pere: My understanding is that this library provides a more stable API towards the compiler. So I understand ROOT will be based on cppinterop. But at the same time ROOT will continue to use cppyy.
Danilo: I think there is a misunderstanding. Cppinterop is a way in which you can redesign the type system, interfacing with LLVM with no cost (compared to the current strategy with strings). Then you will have an interpreter, a layer on top of the AST (cppinterop) and then a bindings engine (cppyy). This overall reduces costs on all layers.
Pere: In the picture of slide 12 I'm missing something. Is cling part of ROOT? or not?
Danilo: If you want, cling is now part of ROOT and will probably be for the foreseeable future. At some point cling will be standalone and ROOT will just have it as dependency.
Vassil: Just to clarify, think of this as building blocks. LLVM provides small bricks. cling provides more bricks, cppinterop brings even more bricks. In first approximation this will reduce the LOC in ROOT-meta in favour of one call to cppinterop API. On a next stage, you can either decide to ship cling as part of cppinterop, but at that point it makes no difference. For ergonomics you can start slimming down ROOT-meta interface and become a common denominator between cppyy and ROOT. It will help to base both ROOT and cppyy on the same codebase, removing the fork completely.
Graeme: A comment on Julia. It's really very interesting, at the moment we have a CXXWrap.jl a layer that binds C++ and Julia, but that requires additional pieces of C++ code to be written. For big libraries such as ROOT this is complicated. Philippe Gras has been working on an engine that analyses the C++ headers, so this API could help in the integration.
Pere: That would be great, but we still have a problem that whatever you do you need to ship the ROOT binaries. Those need to be built, you have to support cross compilation in order to achieve that but ROOT is not there yet.
Danilo: I think this can be put on the PoW if adequate effort is provided, and with enough request from stakeholders.
Vassil: On the Cxx.jl part, when we started cppinterop I had some discussion in that area. One of the reasons they decided to drop cxx.jl is that the developer had all the knowledge to update the latest LLVM and this is hard to transfer. If they used cppinterop as a middle layer, this requirement could be relaxed. This person spent a summer in Axel's office, and they had an almost working ROOT with Julia, they already had 90% of what we could do with Julia back then.