In recent years the size and scale of scientific computing has grown significantly. Computing facilities have grown to the point where energy availability and costs have become important limiting factors for data-center size and density. At the same time, power density limitations in processors themselves are driving interest in more heterogeneous processor architectures. Optimizing application performance is no longer required merely to obtain results faster, but also to stay within the economic limits and constraints imposed by power hungry datacenters. IgProf is an open-source, general purpose memory and performance profiler suite. We present the improvements we have made to permit optimizing the power efficiency of an application. This functionality builds on direct measurements of power related quantities using a newly developed IgProf module which exploits novel on-chip power monitoring capabilities (such as RAPL on new Intel processors). We also explore indirect methods which extrapolate from other measured application characteristics and processor behaviors, using CPU performance counters and processor power states. We demonstrate the use of our tools both on small micro-benchmarks, developed to better understand problems and tuning measurements, and with complex, large-scale C++ applications, derived from millions of lines of code.

