GPU Servers
Waiting for parts.
OpenCL
No news.
Highly Ionizing Particle
Current version:
- Single warp streams neighboring pads in cacheline
- rocprof: 65% Memory Unit utilization (throughput: 130 GB/s ???)
New version:
- 576 threads to read full row (140 pads * 8 timebins)
- Most rows smaller than 140 pads -> wasted threads
- Memory Unit utilization drops to 36%
- Slower by factor 1.57 (= 65 / 36)