Speaker
Dr
Don Holmgren
(Fermilab)
Description
As part of the DOE LQCD-ext project, Fermilab designs, deploys, and operates dedicated high performance clusters for parallel lattice QCD (LQCD) computations. Multicore processors benefit LQCD simulations and have contributed to the steady decrease in price/performance for these calculations over the last decade. We currently operate two large conventional clusters, the older with over 6,800 AMD Barcelona cores distributed across 8-core systems interconnected with DDR Infiniband, and the newer with over 13,400 AMD Magny-Cours cores distributed across 32-core systems interconnected with QDR Infiniband. We will describe the design and operations of these clusters, as well as their performance and the benchmarking data that were used to select the hardware and the techniques used to handle their NUMA architecture.
We will also discuss the design, operations, and performance of a GPU-accelerated cluster that Fermilab will deploy in late November 2011. This cluster will have 152 nVidia Fermi GPUs distributed across 76 servers coupled with QDR Infiniband. In the last several years GPUs have been used to increase the throughput of some LQCD simulations by over tenfold compared with conventional hardware of the same cost. These LQCD codes have evolved from using single GPUs to using multiple GPUs within a server, and now to multiple GPUs distributed across a cluster. The primary goal of this cluster's design is the optimization of large GPU-count LQCD simulations.
Author
Dr
Don Holmgren
(Fermilab)