The Effect of NUMA Tunings on CPU Performance

Apr 14, 2015, 6:15 PM
B503 (B503)



oral presentation Track8: Performance increase and optimization exploiting hardware features Track 8 Session


Christopher Hollowell (Brookhaven National Laboratory)


Non-uniform memory access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory. The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the "numactl" utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the "numad" daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality. As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This presentation gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark.

Primary author

Christopher Hollowell (Brookhaven National Laboratory)


Mr Alexandr Zaytsev (Brookhaven National Laboratory (US)) Costin Caramarcu (Brookhaven National Laboratory (US)) Dr Tony Wong (Brookhaven National Laboratory) William Strecker-Kellogg (Brookhaven National Lab)

Presentation materials