21–25 May 2012
New York City, NY, USA
US/Eastern timezone

Fermilab Multicore and GPU-Accelerated Clusters for Lattice QCD

22 May 2012, 13:30
4h 45m
Rosenthal Pavilion (10th floor) (Kimmel Center)

Rosenthal Pavilion (10th floor)

Kimmel Center

Poster Computer Facilities, Production Grids and Networking (track 4) Poster Session

Speaker

Dr Don Holmgren (Fermilab)

Description

As part of the DOE LQCD-ext project, Fermilab designs, deploys, and operates dedicated high performance clusters for parallel lattice QCD (LQCD) computations. Multicore processors benefit LQCD simulations and have contributed to the steady decrease in price/performance for these calculations over the last decade. We currently operate two large conventional clusters, the older with over 6,800 AMD Barcelona cores distributed across 8-core systems interconnected with DDR Infiniband, and the newer with over 13,400 AMD Magny-Cours cores distributed across 32-core systems interconnected with QDR Infiniband. We will describe the design and operations of these clusters, as well as their performance and the benchmarking data that were used to select the hardware and the techniques used to handle their NUMA architecture. We will also discuss the design, operations, and performance of a GPU-accelerated cluster that Fermilab will deploy in late November 2011. This cluster will have 152 nVidia Fermi GPUs distributed across 76 servers coupled with QDR Infiniband. In the last several years GPUs have been used to increase the throughput of some LQCD simulations by over tenfold compared with conventional hardware of the same cost. These LQCD codes have evolved from using single GPUs to using multiple GPUs within a server, and now to multiple GPUs distributed across a cluster. The primary goal of this cluster's design is the optimization of large GPU-count LQCD simulations.

Author

Dr Don Holmgren (Fermilab)

Co-authors

Mr Amitoj Singh (Fermilab) Dr James Simone (Fermilab) Mr Nirmal Seenu (Fermilab)

Presentation materials