Description
These lectures address the development of scientific applications for multicore computing platforms containing GPU devices as accelerators.
The key goal is to develop practical skills to code applications that run efficiently across different computing systems.
Scientists already attended parallel programming courses to take advantage of multi/many-core computing units, for both x86 and CUDA/GPU environments. However, the code may not be efficiently using the available resources. The first lecture aims to address the necessary expertise to evaluate performance scalability.
Assuming the code is already tuned for either x86 computers or GPU devices, an efficient data distribution and task scheduling between these two types of computing units is an extra burden required for each distinct device/generation. The second lecture explores frameworks that aid the data domain partition and manage these complexities across distinct computing units, at runtime, allowing to code once and port its performance across different computing platforms.