Intel® Threading Building Blocks has become a very popular choice as a parallelization framework at CERN. It is used in several large frameworks developed by the CERN community allowing them to take advantage of shared-memory parallelization.
This workshop is aimed at the experienced user of TBB wanting to learn more about advanced parallelization techniques taking you beyond parallel_for and pipelines. In particular the attendees will learn about TBB Flow Graph, its uses and limitations. The workshop will offer both presentations and hands-on sessions.
This workshop — organized in collaboration with Intel — is a great opportunity to improve your application's performance and ask questions to Intel Software Experts. Sign up now!
The workshop will be delivered by three experts from Intel's Software Group:
- Alex Katranov is a software engineer at Intel. He has professional experience in parallel programming and C++ for almost 10 years. Alexei is involved in multiple activities related to parallelism as well he owns Intel TBB task scheduler development. He gave talks about heterogeneous computations in TBB at IWOCL in 2016 and SES in 2017;
- Aleksei Fedotov is a software engineer at Intel. He worked for a few years on various TBB features such as parallel algorithms, containers, C++11 support. Now he leads the architecture and development of the Flow Graph API, including support for heterogeneity. His interests include parallel computer architectures, parallel programming, runtime development, optimization and machine learning;
- Cédric Andreolli is a software engineer at Intel. He worked for a few years as an application engineer optimizing HPC code for Oil and Gas customers. He has experience in vectorization and threading and he is now working as a technical consulting engineer focusing on compilers. He also worked on tools such as Roofline Model used for characterizing HPC applications.
Due to energy constraints, high performance computing platforms are becoming increasingly heterogeneous, achieving greater performance per watt through the use of hardware that is tuned to specific computational kernels or application domains. It can be challenging for developers to match computations to accelerators, choose models for targeting those accelerators, and then coordinate the use of those accelerators in the context of their larger applications. This tutorial starts with a survey of heterogeneous architectures and programming models, and discusses how to determine if a computation is suitable for a particular accelerator. Next, Intel® Threading Building Blocks (Intel® TBB), a widely used, portable C++ template library for parallel programming is introduced. TBB is available as both a commercial product and as a permissively licensed open-source project at http://www.threadingbuildingblocks.org. The library provides generic parallel algorithms, concurrent containers, a work-stealing task scheduler, a data flow programming abstraction, low-level primitives for synchronization, thread local storage and a scalable memory allocator. The generic algorithms in TBB capture many of the common design patterns used in parallel programming. While TBB was first introduced in 2006 as a shared-memory parallel programming library, it has recently been extended to support heterogeneous programming. These new extensions allow developers more easily to coordinate the use of accelerators such as integrated and discrete GPUs, or devices such as FPGAs into their parallel C++ applications. This tutorial will introduce students to the TBB library and provide a hands-on opportunity to use some of its features for shared-memory programming. The students will then be given an overview of the new features included in the library for heterogeneous programming and have a hands-on opportunity to convert an example they developed for shared-memory into one that performs hybrid execution on both the CPU and an accelerator. Finally, students will be provided with an overview of the TBB Flow Graph Analyzer tool and shown how it can be used to understand application inefficiencies related to utilization of system resources.