Global Parameter Optimisation

Input dataset simulation

Simulated several timeframes:

Every timeframe simulated twice, one for 32 orbits timeframe and one for 128 orbits timeframe

For the moment just one simulation per configuration (beam type - interaction rate - timeframe length)

GPU Parameters study

Focusing on grid and block size. Analysed the GPU workflow of the sync/async TPC processing. Image below is the workflow of two HIP streams of the sync TPC processing:

By looking at the tracefile:

Optimisation strategy

Possible bug spotted

HIP_AMDGPUTARGET set to "default" in GPU/GPUTracking/Standalone/cmake/config.cmake translates in HIP_AMDGPUTARGET=gfx906;gfx908 and forces to use MI50 params

Basically here HIP_AMDGPUTARGET=gfx906;gfx908 enters the first if clause for MI50 even if I am compiling for MI100. Commented set(HIP_AMDGPUTARGET "default") on the config.cmake of the standalone benchmark and forced usage of MI100 parameters via

cmake -DCMAKE_INSTALL_PREFIX=../ -DHIP_AMDGPUTARGET="gfx908" ~/alice/O2/GPU/GPUTracking/Standalone/

Did not investigate further on this.

Possible ideas for post manual optimization

  1. Isolate the parameters which are dependent, i.e. kernels from the same task which run in parallel (e.g. Clusterizer chain)
  2. Apply known optimization techniques to such kernel groups
    1. Grid/random search
    2. Bayesian optimization?
      See: F.-J. Willemsen, R. Van Nieuwpoort, and B. Van Werkhoven, “Bayesian Optimization for auto-tuning GPU kernels”, International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC21), 2021. Available: https://arxiv.org/abs/2111.14991