Speaker
Don Petravick
Description
As part of the DOE SciDAC "National Infrastructure for Lattice Gauge
Computing" project, Fermilab builds and operates production clusters for
lattice QCD simulations. We currently operate three clusters: a 128-node dual
Xeon Myrinet cluster, a 128-node Pentium 4E Myrinet cluster, and a 32-node
dual Xeon Infiniband cluster. We will discuss the operation of these systems
and examine their performance in detail. We will describe the uniform user
runtime environment emerging from the SciDAC collaboration.
The design of lattice QCD clusters requires careful attention towards
balancing memory bandwidth, floating point throughput, and network
performance. We will discuss our investigations of various commodity
processors, including Pentium 4E, Xeon, Itanium2, Opteron, and PPC970, in
terms of their suitability for building balanced QCD clusters. We will also
discuss our early experiences with the emerging Infiniband and PCI Express
architectures. Finally, we will examine historical trends in price to
performance ratios of lattice QCD clusters, and we will present our
predictions and plans for future clusters.