Edgar Fajardo Hernandez (Univ. of California San Diego (US))
The HTCondor batch system is heavily used in the HEP community as the batch system for several WLCG resources. Moreover it is the backbone of the GlideInWMS, the main pilot system used by CMS. To prepare for LHC Run 2, we are probing the scalability limits of new versions and configurations of HTCondor with the goal of reaching at least 200,000 simultaneous running jobs in a single pool. A sleeper pool of this size was achieved without a major impact in real jobs by using only 10,000 real slots distributed at several WLCG sites. We will report on how this was made and the impact it has in future scalability tests, not only of HTCondor but of any web faced service. High configurability is one of the main capabilities of HTCondor. In addition to the test conditions and the testbed topology, we include the suggested configuration options used to obtain the scaling results.Finally, we will list the features present in newer versions of HTCondor that allow for sustained operations at scales well beyond what was previously possible.
Anthony Tiradani (Fermilab) Brian Paul Bockelman (University of Nebraska (US)) Dr Burt Holzman (Fermi National Accelerator Lab. (US)) Dr David Alexander Mason (Fermi National Accelerator Lab. (US)) Mr Jaime Frey (University of Wisconsin) James Letts (Univ. of California San Diego (US)) Todd Tannenbaum (Univ of Wisconsin-Madison, Wisconsin, USA)