Introduces the architecture of heterogeneous platforms, namely the different types of parallelism that can be applied and highlighting the challenges to develop high performance code for this environment. Shows the different programming paradigms for both CPUs and CUDA GPUs, supported by code snippets to bridge the gap between key concepts and basic CPU/GPU specific optimisations. Discusses a scalability study of a data analysis application for a simple dual socket computing system, complemented with a similar study for a CUDA GPU, emphasising the need to simultaneously explore the performance of both CPU and GPU devices.
Targeted audience: This lecture is oriented for physicists and computer scientists developing compute intensive data-parallel applications.
Benefits of attending the lecture: To improve skills to code high-performance and scalable applications, taking advantage of any multicore CPU coupled to a CUDA-GPU accelerator.
Prerequisites: This lecture targets physicists and computer scientists with experience in C++ application development on current computing platforms (laptops/desktops, computing clusters) and basic knowledge of the CUDA environment.