Felix:
Since last time added 'feature' to cpu-allocator to use the pinned memory from the gpu-framework, now all to gpu transferred data is prepared in pinned host memory and transferred via DMA (https://github.com/AliceO2Group/AliceO2/pull/14681). Previously, we allocated via the system allocator and pinned the memory ourselves afterward (also unpinned) for every TF, while I did not measure the impact on the timing, it should be obvious that this is better.
Waiting for recipe to clear parts of the memory in-between iterations. Then repeat test production.
What else is needed to commission ITS GPU tracking for async?