Profiling Celeritas

Peter Heywood, Research Software Engineer

The University of Sheffield

2024-03-27

Context

Increase Science Throughput

  • Ever-increasing demand for increased simulation throughput
  1. Buy more / “better” hardware
  2. Improve Software
    • Improve implementations
    • Improve algorithms (i.e. work efficiency)
  • Must understand software performance to improve performance

Profile

Profiling Tools

  • CPU-only profilers
    • gprof, perf, Kcachegrind, VTune, …
  • AMD Profiling tools
    • roctracer
    • rocsys
    • rocprofv2

Celeritas

The Celeritas project implements HEP detector physics on GPU accelerator hardware with the ultimate goal of supporting the massive computational requirements of the HL-LHC upgrade.

Graphics Processing Unit(s)

  • Highly-parallel many-core co-processors
  • Optimised for throughput
  • (Relatively) Low volume of High-bandwidth memory
  • Power efficient (for suitable workloads)
  • Often connected via low-bandwidth PCIe

Titan Xp & Titan V GPUs

NVIDIA Grace Hopper Superchip

  • GH200 480GB
    • 72-core ARM CPU
    • 480GB LPDDR5X
    • H100 GPU (132 SMs)
    • 96GB HBM3e (4TB/s)
    • NVLink-C2C 900 GB/s bidirectional bandwidth
    • 450-1000W
  • 3 now included in the Bede Tier 2 HPC facility