For writing a new scientific application, portability across existing and future hardware should be the major design goal, as there is a multitude of different compute devices, and codes typically outlive systems by far. Unlike other programming models that address parallelism or heterogeneity, OpenCL does provide practical portability across a wide range of HPC-relevant architectures, and has further advantages like being a library-only implementation, and runtime kernel-compilation.
We present experiences with utilising OpenCL alongside C++, MPI, and CMake in two real-world codes. Our main target is a Cray XC40 supercomputer with multi- and many-core (Xeon Phi) CPUs, as well as smaller systems with Nvidia and AMD GPUs. We shed light on practical issues arising in such a scenario, like the interaction between OpenCL and MPI, discuss solutions, and point out current limitations of OpenCL in the domain of scientific HPC from an application developer's and user's point of view.