Felix:

Since last time added 'feature' to cpu-allocator to use the pinned memory from the gpu-framework, now all to gpu transferred data is prepared in pinned host memory and transferred via DMA (https://github.com/AliceO2Group/AliceO2/pull/14681). Previously, we allocated via the system allocator and pinned the memory ourselves afterward (also unpinned) for every TF, while I did not measure the impact on the timing, it should be obvious that this is better.

Waiting for recipe to clear parts of the memory in-between iterations. Then repeat test production.

What else is needed to commission ITS GPU tracking for async?

News from ITS vertexing (Gabriele)

  1. Is this the best way to split the work among GPU threads?
  2. Is this the best way to deliver the data to GPU?
  3. Is this the best algorithm we can use? Can we find a more GPU-friendly one? (s.t. the CPU version is also optimized, and determinism is not broken)