Gabriele will most likely the person from ALICE to take care of it.

We could provide a recipe to build a container to run the benchmark, or even some way to just run it from CVMFS.

Most practical approach is most likely:
- We take all the dependencies from CVMFS, this will keep the build times shor.
- We build the standalone benchmark for the container, with some option to define which architectures to build for. This will allow to run also on future hardware.
- We have to see how to provide the data sets for the standalone benchmarl (as part of container? On CVMFS?)

It is not necessary to update often, but we might need to update to support new CUDA/ROCm version or new GPUs.

We should run the standalone benchmark in sync and async mode, exporting both performances independently, so they get results for online and for offline.

This will load only the GPU + 1 CPU core.
Should be enough for the start.
For loading GPU + CPU fully, we would need to run other algorithms on the CPU and need manual tuning like for async reco on the EPNs, which is infeasible in a generic form for the time being.
We could think about running the standalone benchmark on CPU and GPU in parallel though, which would not be a too complicated development, and could yield a benchmark to fully load the server.