Many new applications emerge in need of high-performance computing resources. However, they are not always ready to use such resources efficiently. As a result, we observe significant resource waste when such applications are deployed on real heterogeneously, potentially distributed hardware. To avoid such waste in computing, performance engineering methods and tools can be used to identify performance bounds and bottlenecks at application and system level. In this tutorial, we present an introduction (with examples) in performance engineering methods and tools.
In order to put these ideas into practice, we briefly consider a connected component labeling problem from a track reconstruction application implemented for GPGPUs. Through this example, we aim to show how naive implementations of algorithms can lead to wasteful usage of hardware resources, and how profiling tools and models can guide us to implementing zero-waste applications.