Speaker
Description
The CBM experiment is expected to run with a data rate exceeding 500 GB/s even after averaging. At this rate storing raw detector data is not feasible and an efficient online reconstruction is instead required. GPUs have become essential for HPC workloads. The higher memory bandwidth and parallelism of GPUs can provide significant speedups over traditional CPU applications. These properties also make them a promising target for the planned online processing in CBM.
We present an online hit finder for the STS detector capable of running on GPUs. It consists of four steps using STS digis (timestamped detector messages) as input. Digis first are sorted by sensor and then for each sensor, they are sorted by channel and their timestamp. Neighboring digis are combined into clusters. Finally, after time sorting clusters on each sensor are combined into hits.
Each of those steps is trivially parallel across STS sensors or even sensor sides. To fully utilize GPU hardware, we modify the algorithms to be parallel on digi or cluster level. This includes a custom implementation of parallel merge sort allowing full parallelism within GPU blocks.
Our implementation achieves a speedup of 24 on mCBM data compared to the same code on a single CPU core. The exact achieved throughput will be shown and discussed during the presentation.
This work is supported by BMBF (05P21RFFC1).