QMCPACK, a scalable quantum Monte Carlo package (QMC), has been highly optimized for the latest high end microprocessors: arrays and loops have been restructured to get high vectorization ratios, parallelism is easily and efficiently exploited through the MC nature of the algorithm and finally a lot of attention has been paid to use highly tuned MKL libraries. Identifying optimization opportunities and techniques in such a code are challenging. In this talk, we report performance gains (around 15%) can be obtained by using tools which provide non standard views on the code behavior: for example, performing a detailed assessment of the code quality beyond standard vectorization, analyzing accurately the impact on performance of data access and exploring automatically multiple parallel configurations. This improvement is directly translated into energy saving and increased productivity of QMC which consumes a significant fraction of leadership computing resources, such as ALCF's Theta KNL cluster. Also presented are the various tools used and how they provided us with key insights to improve QMCPACK performance.