# ITS tracking status

1. Quick overview of current CPU refactoring
2. New features currently in development
3. Status of the GPU part

---

## 1.1 Current CPU refactoring


---

## 1.2 Memory consumption

Here for one particulary parasitic TF in Pb-Pb with two central collisions in on readout-frame. 
Plot shows before(blue)/after(red) refactoring.


Plot shows a special mode `perPrimaryVertexProcessing` which reduces the comb. significantly but has some currently studied side effects. Blue is with this mode enabled now, red as it is before.

## 2. New/Improved features: `DeltaROF` tracking
In contrived MC with exagerated slope in the chip response function one can clearly see the drop of eff. towards the beginning/end of the readout-frame.
In analysis these collisions are completely cut away (standard selections) ~20% of data loss.
This feature should cure this.

Comes with an increase cost in memory and time (but worth it)


## GPU Part
After the refactoring of the reco code, checked again if GPU is now messed up. After some fixes minor fixes, in debug mode the output is 1:1 the one from the cpu code (10TFs in 40kHz Pb-Pb):

```cpp
#define GPU_BLOCKS GPUCA_DETERMINISTIC_CODE(1, 99999)
#define GPU_THREADS GPUCA_DETERMINISTIC_CODE(1, 99999)

// calling kernels like this -> effectively serializing the code (very slow)
gpu::computeLayerTrackletsMultiROFKernel<true><<<o2::gpu::CAMath::Min(nBlocks, GPU_BLOCKS),
                                                     o2::gpu::CAMath::Min(nThreads, GPU_THREADS),
                                                     0,
                                                     streams[iLayer].get()>>>(...)
// now this is not needed anymore and the det. mode is `fast-ish`
```

TODOs (help welcome)
    - validate new features if they also work on GPU
    - porting Vertexing code to GPU
    - optimize GPU part:
        - `started` using multiple streams very much in its infancy
        - optimise load times/processing etc. (probably [not measured] parts could be improved)

        - optimise number of blocks/threads per kernel (for Pb-Pb + pp sep., fairly small number of kernels to be optimised)

          - Have a CI procedure/periodic check that runs ITS GPU reconstruction and then gets some automatic checks and numbers (deterministic mode is more important)

          - Ensure gpu-reco-workflow with ITS enabled are the same that one gets from its-reco-workflow --gpu