Speaker
S. Dasu
(UNIVERSITY OF WISCONSIN)
Description
The University of Wisconsin distributed computing research groups
developed a software system called Condor for high throughput computing
using commodity hardware. An adaptation of this software, Condor-G, is
part of Globus grid computing toolkit. However, original Condor has
additional features that allows building of an enterprise level grid.
Several UW departments have Condor computing pools that are integrated
in such a way as to flock jobs from one pool to another as resources
become available. An interdisciplinary team of UW researchers recently
built a new distributed computing facility, the Grid Laboratory of
Wisconsin (GLOW). In total Condor pools in the UW have about 2000 Intel
CPUs (P-III and Xeon) which are available for scientific computation.
By exploiting special features of Condor such as checkpointing and
remote IO we have generated over 10 million fully simulated CMS events.
We were able to harness about 260 CPU-days per day for a period of 2
months when we were operational late fall. We have scaled to using 500
CPUs concurrently when opportunity to exploit unused resources in
laboratories on our campus. We have built a scalable job submission and
tracking system called Jug using Python and mySQL which enabled us to
scale to run hundreds of jobs simultaneously. Jug also ensured that the
data generated is transferred to US Tier-I center at Fermilab. We have
also built a portal to our resources and participated in Grid2003
project. We are currently adapting our environment for providing
analysis resources. In this paper we will discuss our experience and
observations regarding the use of opportunistic resources, and
generalize them to wider grid computing context.
Authors
D. Bradley
(UNIVERSITY OF WISCONSIN)
M. Livny
(UNIVERSITY OF WISCONSIN)
S. Dasu
(UNIVERSITY OF WISCONSIN)
S. Rader
(UNIVERSITY OF WISCONSIN)
V. Puttabuddhi
(UNIVERSITY OF WISCONSIN)
W. Smith
(UNIVERSITY OF WISCONSIN)