OpenCL

Clusterizer Issues

-> With both fixes:  Clusterizer now works in OpenCL

Recursion in TPCFastTransformation

Lots of helper function written for compile time recursion, e.g.:

GPUd() static constexpr uint32_t factorial(const uint32_t n) { return (n == 0) || (n == 1) ? 1 : n * factorial(n - 1); }

Fine for CUDA / HIP, but OpenCL C++ prohibits recursion. Trivial to fix in C++20 with consteval but OpenCL C++ still based on C++17, so use templates instead for recursion: https://github.com/AliceO2Group/AliceO2/pull/14462 

-> Waiting for feedback from TPC 

PoCL issues

PoCL rejects recursive functions, but only tells you where the function is called, not which recursive function it found. E.g. you get errors like this:

Recursion detected in function: '_ZNU3AS42o23gpu7GPUdEdx11fillClusterEffihffRU3AS4KNS0_23GPUCalibObjectsTemplateINS0_8ConstPtrEEEfff'

Made debugging issue in TPCFastTransform very tedious. 

Patched PoCL to demangle C++ symbols + print infringing function. So error now look like this:

Recursion detected in function: 'o2::gpu::GPUdEdx::fillCluster(float, float, int, unsigned char, float, float, o2::gpu::GPUCalibObjectsTemplate<o2::gpu::ConstPtr> const&, float, float, float)'
-> Infriging function: 'o2::gpu::MultivariatePolynomialParametersHelper::factorial(unsigned int)'

(Side note: LLVM demangler can't demangle OpenCL symbols because of address space qualifiers in the mangled name. Have the form 'U1AS1'-> Hack: prune qualifiers before demangling...)

Other issues

Couple of other issues that make debugging harder in PoCL: