Description
The CDF Analysis Facility (CAF) has been in use since April 2002
and has successfully served 100s of users on 1000s of CPUs.
The original CAF used FBSNG as a batch manager.
In the current trend toward multisite deployment,
FBSNG was found to be a limiting factor,
so the CAF has been reimplemented to use Condor instead.
Condor is a more widely used batch system and
is well integrated with the emerging grid tools.
One of the most useful being the ability to run seamlessly
on top of other batch systems.
The transition has brought us a lot of additional benefits,
such as ease of installation, fault tolerance and
increased manageability of the cluster.
The CAF infrastructure has also been simplified a lot
since Condor implements a number of features we had to
implement ourselves with FBSNG.
In addition, our users have found that Condor's fair share mechanism
provides a more equitable and predictable distribution of resources.
In this talk the Condor based CAF will be presented,
with particular emphasis on the changes needed to run with Condor,
the problems found during and the advantages gained by the transition.
Some background and the plans for the future, as well as results
from Condor scalability tests will also be presented.
Primary authors
E. Lipeles
(UNIVERSITY OF CALIFORNIA SAN DIEGO)
F. Wuerthwein
(UCSD)
I. Sfiligoi
(INFN Frascati)
M. Neubauer
(University of California, San Diego)