SoA Benchmarks

We have set up a benchmark repository: https://github.com/cern-nextgen/wp1.7-soa-benchmark
We spotted the following issues:
- Much lower performance with GCC on one specific example: Fixed by David by force-inlining.
- In some examples, some loops were not vectorized: Fixed
- Baseline and our code is 2x slower with clang (compared to gcc) on one specific example: Still investigating.

Simplify the SoA Code

We got feedback from other developers using our code in their frameworks.

They said the code is too complicated. In particular, too many template parameters have to be specified.

This is an example of how to evaluate a struct of arrays, for each member, at index i.

helper::apply_to_members<M, const array_type&, proxy_type<const_reference, S>>(*this, evaluate_at<F>(i));

The (template) code was simplified. For example, the code above now looks as follows:

helper::apply_to_members<const_reference>(*this, evaluate_at<F>(i));

Got the OK from Ricardo Rocha to set up a CI-pipeline for the O2 standalone benchmark on running NGT hardware.
I am creating a proof-of-concept pipeline doing the following:
- Compile on GitHub hosted Runner
- Copy the executables to the NGT self-hosted runners
- Run the executables on NGT GPUs
The goal is to have a CI-pipeline that tests and benchmarks O2 standalone in this fashion.