Niko Neufeld (CERN)
The ARM architecture is a power-efficient design that is used in most processors in mobile devices all around the world today since they provide reasonable compute performance per watt. The current LHCb software stack is designed (and expected) to build and run on machines with the x86/x86_64 architecture. This paper outlines the process of measuring the performance of the LHCb software stack on the ARM architecture - specifically, the ARMv7 architecture on Cortex-A9 processors from NVIDIA, and also on full-fledged ARM servers with Calxeda chipsets - and makes comparisons with the performance on x86_64 architectures on the Intel Xeon 5650 and AMD Opteron 6272. The paper emphasises the aspects of performance per core with respect to the power drawn by the compute nodes for the given performance - this ensures a fair real-world comparison with much more `powerful' Intel/AMD processors. The comparisons of these real workloads in a HEP context are also complemented with standard synthetic benchmarks like HEPSPEC, LMBench and Coremark. The pitfalls and solutions for the non-trivial task of porting the source code to build for the ARMv7 instruction set are presented. The specific changes in the build process needed for ARM-specific portions of the software stack are described, to serve as pointers for further attempts taken up by other groups in this direction. Cases where architecture-specific tweaks at the assembler lever (both in ROOT and the LHCb software stack) were needed for a successful compile are detailed - these cases are good indicators of where/how the software stack as well as the build system can be made more portable and multi-arch friendly. The experience gained from the tasks described in this paper are intended to i) assist in making an informed choice about ARM-based server solutions as a feasible low-power alternative to the current compute nodes, and ii) revisit the software design and build system for portability and generic improvements.