Speaker
Description
High-energy physics analyses involve complex computations over large, irregular, nested data structures. Libraries such as Awkward Array have demonstrated that the massive parallelism of GPUs can be applied to accelerate these analyses. However, today this requires significant expertise from both library developers and end users, who must navigate the low-level details of CUDA kernel programming—often writing kernels in Numba or CUDA C++ to get the job done.
This tutorial introduces the CUDA Core Compute Libraries for Python (cuda-cccl), designed to simplify parallel programming on NVIDIA GPUs. It exposes parallel primitives such as reduce, sort and histogram, and tools to combine them into more complex algorithms. In particular, algorithms can be segmented to achieve efficient event-level parallelism crucial to HEP analyses. These enable Python developers to compose high-performance GPU algorithms that result in efficient, fused kernels, without ever leaving Python or writing low-level CUDA.
This tutorial is for both maintainers of Python libraries like Awkward and analysts interested in speeding up algorithms for HEP analysis with GPUs. Participants will understand how to use cuda-cccl to develop optimized GPU algorithms purely from Python, and how to make GPU acceleration more accessible and maintainable across the HEP ecosystem.