S. Dasu (UNIVERSITY OF WISCONSIN)
The University of Wisconsin distributed computing research groups developed a software system called Condor for high throughput computing using commodity hardware. An adaptation of this software, Condor-G, is part of Globus grid computing toolkit. However, original Condor has additional features that allows building of an enterprise level grid. Several UW departments have Condor computing pools that are integrated in such a way as to flock jobs from one pool to another as resources become available. An interdisciplinary team of UW researchers recently built a new distributed computing facility, the Grid Laboratory of Wisconsin (GLOW). In total Condor pools in the UW have about 2000 Intel CPUs (P-III and Xeon) which are available for scientific computation. By exploiting special features of Condor such as checkpointing and remote IO we have generated over 10 million fully simulated CMS events. We were able to harness about 260 CPU-days per day for a period of 2 months when we were operational late fall. We have scaled to using 500 CPUs concurrently when opportunity to exploit unused resources in laboratories on our campus. We have built a scalable job submission and tracking system called Jug using Python and mySQL which enabled us to scale to run hundreds of jobs simultaneously. Jug also ensured that the data generated is transferred to US Tier-I center at Fermilab. We have also built a portal to our resources and participated in Grid2003 project. We are currently adapting our environment for providing analysis resources. In this paper we will discuss our experience and observations regarding the use of opportunistic resources, and generalize them to wider grid computing context.