Pushing HTCondor and glideinWMS to 200K+ Jobs in a Global Pool for CMS before LHC Run 2

Not scheduled
15m
OIST

OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495
poster presentation Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing

Speaker

James Letts (Univ. of California San Diego (US))

Description

The CMS experiment at the LHC relies on HTCondor and glideinWMS as its primary batch and pilot-based Grid provisioning system. So far we have been running several independent resource pools, but we are working on unifying them all to reduce the operational load and more effectively share resources between various activities in CMS. The major challenge of this unification activity is scale. The combined pool size is expected to reach 200K job slots, which is significantly bigger than any other multi-user HTCondor based system currently in production. To get there we have studied scaling limitations in our existing pools, the biggest of which tops out at about 70K slots, providing valuable feedback to the development communities, who have responded by delivering improvements which have helped us reach higher and higher scales with more stability. We have also worked on improving the organization and support model for this critical service during Run 2 of the LHC. This contribution will present the results of the scale testing and experiences from the first months of running the Global Pool.

Primary author

James Letts (Univ. of California San Diego (US))

Co-authors

Alison Mc Crea (Univ. of California San Diego (US)) Brian Paul Bockelman (University of Nebraska (US)) Dr David Alexander Mason (Fermi National Accelerator Lab. (US)) Farrukh Aftab Khan (National Centre for Physics (PK)) Igor Sfiligoi (Univ. of California San Diego (US)) Justas Balcas (Vilnius University (LT)) Krista Majewski (Fermi National Accelerator Lab. (US)) Luis Emiro Linares Garcia (Universidad de los Andes (CO)) Marco Mascheroni (Universita & INFN, Milano-Bicocca (IT)) Maria Dolores Saiz Santos (Univ. of California San Diego (US)) Oliver Gutsche (Fermi National Accelerator Lab. (US)) Stefano Belforte (Universita e INFN (IT))

Presentation materials