Color code: (critical, news from this week: blue, news from last week: purple, no news: black)

Sync reconstruction

Async reconstruction

Need to investigate short GPU stall problem.
Limiting factor for pp workflow is now the TPC time series, which is to slow and creates backpressure (costs ~20% performance on EPNs). Enabled multi-threading as recommended by Matthias - need to check if it works.
Test with GPU GRID jobs at NERSC pending.
Will tune existing 16-core settings, add a SITEARCH for 16core CPU, and 16coreCPU + generic NVIDIA / AMD GPU, like for 8 core.
Will retune EPN async workflow for TPC + ITS on GPU on 2025 data.

GPU ROCm / compiler topics:

Problem with building ONNXRuntime with MigraphX support.
Need to find a way to build ONNXRuntime with support for CUDA and for ROCm.
Try to find a better solution for the problem with __device__ inline functions leaking symbols in the host code.
Need to check ROCm 7.2 corrtecness.
Need to understand and fix crash on RTX Pro 6000.

TPC / GPU Processing

WIP: Use alignas() or find a better solution to fix alignment of monte carlo labels: https://its.cern.ch/jira/browse/O2-5314
Waiting for TPC to fix bogus TPC transformations for good, then we can revert the workaround.
Final solution: merging transformation maps on the fly into a single flat object:
- Maps now yielding correct results, but 1.5x performance regression running on GPUs. Must be investigated.
Need to check the problem with ONNX external memory allocator.
Next high priority topic: Improvements for cluster sharing and cluster attachment at lower TPC pad rows. PR: https://github.com/AliceO2Group/AliceO2/pull/14542
TODO: Workaround for wrong field used for encoding online, make memory scaling factors configurable via ConfigurableParam

Other topics:

Need to bump ONNXRuntime to 1.24, Giulio is checking, needed for ROCm 7.2 - Status?
Status of bumping CMake and boost (https://github.com/alisw/alidist/pull/6135):
- Remaining issues:
  - libwebsockets kernel headers
  - Ernst following up issues with ODC/DDS
  - aarch uses old python, trying to compilg old xgboost, which is incompatible.
  - One compilation problem on MacOS with boost histogram.

EPN GPU Topics: