Newer generations of processors come with no increase in their clock frequency, and the same is true for memory chips. In order to achieve more performance, the core count is getting higher, and to feed all the cores on a chip with instructions and data, the number of memory channels must follow the same trend. Non Uniform Memory Access (NUMA) architecture allowed the CPU manufacturers to reduce nicely the impact of memory subsystem bottlenecks, but, in turn, this solution introduces a cost at the application level. This paper describes our practical experience with the typical CPU servers currently available to the HEP community, based on work with NUMA systems at CERN openlab. We provide the latest measurements of the different NUMA implementations from AMD and Intel, as well as NUMA consequences on some parallelized HEP codes.