Work in the last week mainly to target the CHEP presentation
- ITS GPU Tracking
- Time measurements at different PbPb interaction rates: (kHz) 6.2 12.6 18.0 22.5 28.0 29.0 33.0 43.0
- Compared 20threads its-reco vs 20threads+GPU tracking: equivalent at each IR. Better visualisation in preparation.
- Compared 1 thread its-reco vs 1 thread+GPU: GPU is faster, see example.

- Comparison between the fitting time will also be shown.
- The plan is to go for the async pass of PbPb '24 showing together with the consistency plots, so that we can run the ITS fitting on GPU.
- DCA Fitter on GPU
- Further development: more intelligent scheduling of the fitters to profit from multi-channel I/O and parallel execution of the kernels
- Relies on an external singleton interface for GPU streams and allocation (pluggable to GPU reconstruction in case of reconstruction tasks or direct CUDA APIs for analysis).
- Improved DCAFitterTest on GPU to become a proper benchmark + results check
- Scaling tests are ongoing; up to 1M fits simultaneously. This applies to some HF PbPb analyses (1M per collision). I will try 10M on V100 and MI50/100.
- The goal is to show the successful porting of a fundamental part of the reconstruction code that would serve SVertexing and similia in the future and will be portable in analysis.