News
No news
Meeting News
PPP
There will be a PPP this week.
[Monica]: Ruban - University of Amsterdam - Using GPU as a decompression accelerator next week.
Topics
ACAT2024:
- Small venue (120 people).
- Poster sessions squeezed into coffee breaks.
- Performance metrics not represented in the same way.
Few talks related to RNTuple:
- RNTuple in Athena: Comment: how to get involved in (the soon-to-start) RNTuple with ATLAS?
- RNTupleInspector: Mention on the analyzer/linter got interest from the audience.
- RNTuple analysis with RDF - Couple of questions on the performance.
- HEP-CCE - It is important to have a full picture of what we're doing. Will bring tangible benefits.
Talks on ML:
- Interest in Sofie.
- Memory usage of the inference running with C++ code with TensorFlow code - we don't have the numbers, we should investigate more.
- It could be interesting to reach out to and find if the community is using this to find the missing parts.
Chats
Liz:
- Reassured that we are in collaboration for RNTuple support CNS EDMs.
Jim:
- Discussion on compressions.
- Birds of a feather session: HEP help-desk - similar to the aarchi model - an LLM trained on all root documentation and all the related information - provide a centralized place. The tool doesn't provide the answer but points to the answer.
Gordon:
- Question about debuggability - make tools that don't need debugging.
- RNTuple - asked him specifically - not going to work on it unless the experiment forces him to do so.
Natalie:
- Discussion on HEPScore benchmarks.
Tomasso:
1) Varied snapshot:
- Process output file separately. Different cardinalities and can be worked separately.
[Philippe]: This requires some duplication of data.
[Florine]: If you have T3 index you can only write the varied parts and can work around it.
[Philippe]: We still have duplicated all the sparse data, but right.
[Vincenzo]: They want to reconstruct data in a generic way, with some performance issues. It's worth a try and worth benchmarking.
2) Objectification of NanoAOD inputs.
- CMS can already do it with a tool called Bamboo - official request.
- Danilo's solution: C++ classes to represent in memory (user-defined) or a thin layer that does graph generation.
[Jacob] Situation is much better than expected; there is a way to do it, but for CMS users, it's ok.
[Vincenzo] Bamboo is a whole framework on top of CMS. It's always a bargain.
[JonasH] Don't use objects if you need performance.
Impact of Cppy Upgrade:
- There are not many.
- Not duplicating code anymore.
Summary: In fact, Cppy has many parts python library, C++ extension, and wrapper around cling - this has root-meta. Synchronizing with upstream expect the part taken from ROOT meta and cling.
Implicit conversion of std:: string to Python string.
- The only motivation of not doing this is Unicode conversion and safety.
- If the users are explicitly type-checking, it might fail/crash.
[JonasH]: Check if it's Unicode, if not return bytes for non-Unicode stuff. For convenience if it's a Unicode I expect, I don't want it to crash.
[Aaron]: With different encoding there were some corner cases.
JonasR: A check at the python level would work, but some performance overhead.
[JonasH]: There must be a function to convert a character array to python string.
JonasR: In this case you have to also check if it is convertible first.
[JonasH]: You only have to correctly catch the error - there's a way to do with CPython as well. Nice thing is that the code down there [in the slide] also works.
JonasR Advocate for keeping bytes object or erroring out.
[Jacob]: Suggestion - Keep bytes, gives the largest usability surface.
JonasR: Keeping a standard string would be the same as upstream.
[Vincenzo]: I agree, if it can be upstream, there is no question we shouldn't do it. If not, we just have an extra patch. Instead of crashing, emit a warning or an error. and we could remove the patch.
JonasR: It crashes anyway, so nobody uses it anyway. It can be an error directly.
JonasR: It would be nice if it is consistent and not returning Unicode or bytes sometimes.
JonasR: It's either improving the error or returning the bytes objects directly. Prefer having a clear error (if it cannot convert to Unicode).
[Jacob]: Handling of the valid case is the problem.
[Philippe?]: In long term, there is not way of getting a non-Unicode in python.
[JonasH]: Not important to users because there is no way either.
[Vassil]: Is it only going to break Unicode case?
[Vincenzo]: Non-Unicode already crashes.
[Vassil]: So we're discussing a hypothetical case?
- There is no difference in the working.
"Strict" memory policy
- There's a flag in Cppy to change this memory policy.
- Some bugs in current Cppy regarding this memory heuristics because it is not tested. In the future, we can think about synchronizing this policy.
[Jacob]: This is a bit strange this heuristics when you look from the C++ perspective.
JonasR: There are some void pointer cases in the early versions of pyroot. It would not be difficult to go to strict memory policy.
[Vassil]: One can implement an LLVM pass if we see delete. Some annotation will be useful on the interfaces.
JonasR: We can annotate at the python level but it would be difficult, it would be nice to do it at C++ level
[Vincenzo]: Solution - set memory policy to strict by default. It only applies to older parts of root. It seems to me that it is, a clang annotation is an overkill.
[Jacob]: What about third parties that depend on this case.
[Vincenzo]: Anything/framework based on older pyroot before a certain point is going to break.
[Vassil]: It is very important to add annotations - nullability and ownership. If this is going to be null, then don't call.
[JonasH]: If we the APIs want to make sure there are no null pointers it should be a reference.
[Vincenzo]: Clarification: There is already an existing infrastructure so that we don't need any extra work need to do make this happen?
No implicit conversion for char to null
- char[] converted to Unicode string.
- Convert buffer back to Unicode string use the function.
- This is pretty reasonable.
JonasR If it's not null-terminated Cppy already knows it. You can work around that
[JonasH]: Do we do this right now?
JonasR: In root we have some unit test with char buffers that contain country code - had to add asterisks to make it work.
Performance:
- Just run the test but show that numbers. Compared the runtime with and without the upgrade, but it's basically the same.
JonasR: It could be that the implicit conversion isn't done anymore that might be the explanation for some improvement.
JonasR: We still have 2 months before release, we have time to fix it, if there are any.
Summary:
- memory - stick with current heuristics - with next release change.
- remove implicit conversion.
[JonasH]: Are all the corner cases solved?
JonasR: There's even less tests failing than before.
[Vincenzo]: For further developments - we only have a patched cling wrapper, anything else in the Cppy stack is the same?
JonasR: No, there are some changes and reverts. Implicit std namespace is an example, TString needs a custom converter. ROOT type alias is long64_t etc.
[Vincenzo]: We can do something similar to what we did for LLVM. Create a monorepo and replicate what we did.
JonasR: All the patches are in one directory. Everything is traced.
[Vassil]: Real problem is going to be in the backend, for IO we want one thing. It is going to converge.
../
{Discussions without converging on a solution for future upgrades}
../
[Jacob]: Question of handling future upgrades we move it to the next slot or another meeting.
[SKIPPED ROUNDTABLE]
- Task for everyone: Check the problem of work and fill it up because there is a quarterly review coming up.
[Meeting Ended]
There are minutes attached to this event.
Show them.