Speaker
Daniel Charles Bradley
(High Energy Physics)
Description
A number of recent enhancements to the Condor batch system have been stimulated by the challenges of LHC computing. The result is a more robust, scalable, and flexible computing platform. One product of this effort is the Condor JobRouter, which serves as a high-throughput scheduler for feeding multiple (e.g. grid) queues from a single input job queue. We describe its principles and how it has been used at large scale in CMS production on the Open Science Grid. Improved scalability of Condor is another welcome advance. We describe the scaling characteristics of the Condor batch system under large workloads and when integrating large pools of resources; we then detail how LHC physicists have directly profited under the expanded scaling regime. Finally, we present some practical configurations that we have used to take advantage of Condor's adaptability: many flavors of prioritization, policies for sharing resources in a campus grid, and a good start on supporting a mix of single-core and multi-core jobs.
Author
Daniel Charles Bradley
(High Energy Physics)
Co-authors
Greg Thain
(University of Wisconsin)
Prof.
Miron Livny
(University of Wisconsin)
Prof.
Sridhara Dasu
(University of Wisconsin)
Todd Tannenbaum
(University of Wisconsin)