GPUCA_KERNEL_RESOURCE_USAGE_VERBOSE 1block_size = 512, grid_size = 540 for GMMergerCollect512, 9 on the parameter header --> try to fit 9 blocks of 512 threads each on each CUGPUCA_KERNEL_RESOURCE_USAGE_VERBOSE said: Occupancy [waves/SIMD]: 9Chatted with Robin yesterday. They are prone to use the CI container directly.
From Nvidia docs:
CUDA Compatibility guarantees allow for upgrading only certain components.
Backwards compatibility ensures that a newer NVIDIA driver can be used with an older CUDA Toolkit.
Minor version and forward compatibility ensure that an older NVIDIA driver can be used with a newer CUDA Toolkit (until certain version).
FAQ: Does CUDA compatibility work with containers? Yes, when using containers that are based on the official CUDA base images.
| GPU model | Driver version |
| Tesla T4 | 575.57.08 (CUDA Version: 12.9) |
| A100 | 575.57.08 (CUDA Version: 12.9) |
| V100S | 560.35.05 (CUDA Version: 12.6) |
Differences between sync and async run of the benchmark, except from RTC and compression/decompression?