Philippe Canal (Fermi National Accelerator Lab. (US))
The recent prevalence of hardware architectures of many-core or accelerated processors opens opportunities for concurrent programming models taking advantages of both SIMD and SIMT architectures. The Geant Vector Prototype has been designed both to exploit the vector capability of main stream CPUs and to take advantage of Coprocessors including NVidia’s GPU and Intel Xeon Phi. The characteristics of each of those architectures are very different in term of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. Between each platforms the number of individual tasks to be processed ‘at once’ for efficient use of the hardware varies sometimes by an order of magnitude. The granularity of the code executed may also need to be dynamically adjusted. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. We will present the challenges, solutions and resulting performance of running an end to end detector simulation concurrently on a main stream CPU and a coprocessor and detail the broker implementation bridging the disparity between the two architectures. The impacts of task decomposition, vectorization, efficient sampling techniques and data look-up using track level parallelism will be also evaluated on vector and massively parallel architectures.
Andrei Gheata (CERN) Mr Federico Carminati (CERN) Georgios Bitzes (National and Kapodistrian University of Athens (GR)) Guilherme Lima (FermiLab (US)) Johannes Christof de Fine Licht (University of Copenhagen (DK)) John Apostolakis (CERN) Laurent Duhem Marilena Bandieramonte (Universita e INFN (IT)) Mihaly Novak (CERN) Oksana Shadura (National Technical Univ. of Ukraine "Kyiv Polytechnic Institute) Raman Sehgal (Bhabha Atomic Research Centre (IN)) Dr Rene Brun (CERN) Sandro Christian Wenzel (CERN) Victor Daniel Elvira (Fermi National Accelerator Lab. (US))