27 September 2004 to 1 October 2004
Interlaken, Switzerland
Developing & Managing a large Linux farm - the Brookhaven Experience

27 Sep 2004, 14:20
This presentation describes the experiences and the lessons learned by the RHIC/ATLAS Computing Facility (RACF) in building and managing its 2,700+ CPU (and growing) Linux Farm over the past 6+ years. We describe how hardware cost, end-user needs, infrastructure, footprint, hardware configuration, vendor selection, software support and other considerations have played a role in the process of steering the growth of the RACF Linux Farm, and how they help shape our future hardware purchase decisions. As well as a detailed description of the challenges encountered and of the solutions used in managing and configuring a large, heterogenous Linux Farm (2700+ CPU's) in the midst of an ongoing transition from being a generally local resource to a global, Grid-aware resource within a larger, distributed computing environment is provided.

Primary authors

A. Withers (BROOKHAVEN NATIONAL LAB) C. Hollowell (Brookhaven National Lab) J. Smith (BROOKHAVEN NATIONAL LAB) O. Rind (Brookhaven National Lab) R. Hogue (Brookhaven National Lab) S. Misawa (Brookhaven National Lab) T. Chan (BROOKHAVEN NATIONAL LAB) Tomasz WLODEK (BNL)

