23–28 Oct 2022
Villa Romanazzi Carducci, Bari, Italy
Europe/Rome timezone

Portable Programming Model Exploration for LArTPC Simulation in a Heterogeneous Computing Environment: OpenMP vs. SYCL

25 Oct 2022, 17:20
20m
Sala Federico II (Villa Romanazzi)

Sala Federico II

Villa Romanazzi

Oral Track 1: Computing Technology for Physics Research Track 1: Computing Technology for Physics Research

Speaker

Dr Meifeng Lin (Brookhaven National Laboratory (US))

Description

The evolution of the computing landscape has resulted in the proliferation of diverse hardware architectures, with different flavors of GPUs and other compute accelerators becoming more widely available. To facilitate the efficient use of these architectures in a heterogeneous computing environment, several programming models are available to enable portability and performance across different computing systems, such as Kokkos, SYCL, OpenMP and others. As part of the High Energy Physics Center for Computational Excellence (HEP-CCE) project, we investigate if and how these different programming models may be suitable for experimental HEP workflows through a few representative use cases. One of such use cases is the Liquid Argon Time Projection Chamber (LArTPC) simulation which is essential for LArTPC detector design, validation and data analysis. Following up on our previous investigations [1, 2] of using Kokkos to port LArTPC simulation in the Wire-Cell Toolkit (WCT) to GPUs, we have explored OpenMP and SYCL as potential portable programming models for WCT, with the goal to make diverse computing resources accessible to the LArTPC simulations. In this presentation, we will describe how we utilize relevant features of OpenMP and SYCL for the LArTPC simulation module in WCT. We will also show performance benchmark results on multi-core CPUs, NVIDIA and AMD GPUs for both the OpenMP and the SYCL implementations. Comparisons with different compilers will be given. Advantages and disadvantages of using OpenMP, SYCL and Kokkos in this particular use case will also be discussed.

References

[1] Yu, Haiwang; Dong, Zhihua; Knoepfel, Kyle; Lin, Meifeng; Viren, Brett; Yu, Kwangmin; Evaluation of Portable Acceleration Solutions for LArTPC Simulation Using Wire-Cell Toolkit,EPJ Web of Conferences,251,03032,2021,EDP Sciences
[2] Dong, Zhihua; Knoepfel, Kyle; Lin, Meifeng; Viren, Brett; Yu, Haiwang; Evaluation of Portable Programming Models to Accelerate LArTPC Detector Simulations, arXiv preprint arXiv:2203.02479,2022

Significance

OpenMP and SYCL are two very different programing models, with OpenMP being compiler directive-based and SYCL a C++-based framework. OpenMP is easy to add to existing codes, while using SYCL will require more code changes. This is the first time OpenMP has been implemented in the context of HEP-CCE. In this presentation we intend to show the feasibility of using OpenMP in C++ code bases to achieve performance portability, while contrasting it with C++-based frameworks, Kokkos and SYCL. We believe this will be of great interest to the broader HEP community.

The experimental high energy physics community has traditionally relied on homogeneous CPU resources, which have been the main target of many HEP software suites. However, it is expected that the CPU resources alone will not be able to meet the computational requirements of the next-generation HEP experiments, such as the Deep Underground Neutrino Experiment (DUNE). We have to adapt our software to utilize compute-accelerator-based heterogeneous computing resources that are provided in large-scale high performance computing facilities. Our exploration of different portable programming models will help guide the software adaptation strategies for WCT, and also inform other HEP software projects of the pros and cons of these programming models.

Experiment context, if any The Liquid Argon Time Projection Chamber (LArTPC) is a key detector technology that is widely used in current and next generation experiments for neutrino physics, e.g., DUNE and the SBN program. Neutrino events in the LArTPC manifest a large number of disparate patterns, which raises the opportunity for deep-learning algorithms. However, training such algorithms requires very large data sets to achieve accurate performance. But generating such large data sets remains challenging for traditional x86 CPU centric computing facilities. Accelerating the LArTPC simulation with heterogeneous architectures could significantly boost the efficiency of algorithm developments and further the accuracy of physics analyses.

Primary authors

Dr Zhihua Dong (Brookhaven National Laboratory) Dr Kyle Knoepfel (Fermi National Accelerator Laboartory) Dr Meifeng Lin (Brookhaven National Laboratory (US)) Vincent Pascuzzi (Brookhaven National Laboratory) Brett Viren (Brookhaven National Laboratory) Tianle Wang (Brookhaven National Laboratory) Dr Haiwang Yu (Brookhaven National Laboratory)

Presentation materials

Peer reviewing

Paper