CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits

11 Oct 2016, 11:15

15m

GG C2 (San Francisco Mariott Marquis)

GG C2

San Francisco Mariott Marquis

Oral Track 3: Distributed Computing Track 3: Distributed Computing

Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)

The CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. Total resources at Tier-1 and Tier-2 sites pledged to CMS exceed 100,000 CPU cores, and another 50,000-100,000 CPU cores are available opportunistically, pushing the needs of the Global Pool to higher scales each year. These resources are becoming more diverse in their accessibility and configuration over time. Furthermore, the challenge of stably running at higher and higher scales while introducing new modes of operation such as multi-core pilots, as well as the chaotic nature of physics analysis workflows, place huge strains on the submission infrastructure. This paper details some of the most important challenges to scalability and stability that the Global Pool has faced since the beginning of the LHC Run II and how they were overcome.

Primary Keyword (Mandatory)	Computing facilities
Secondary Keyword (Optional)	Computing middleware
Tertiary Keyword (Optional)	Distributed workload management

James Letts (Univ. of California San Diego (US))

Anthony Tiradani (Fermilab) Antonio Perez-Calero Yzquierdo (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES) Brian Paul Bockelman (University of Nebraska (US)) David Alexander Mason (Fermi National Accelerator Lab. (US)) Dirk Hufnagel (Fermi National Accelerator Lab. (US)) Farrukh Aftab Khan (National Centre for Physics (PK)) Jadir Marra Da Silva (UNESP - Universidade Estadual Paulista (BR)) Justas Balcas (California Institute of Technology (US)) Kenyi Paolo Hurtado Anampa (University of Notre Dame (US)) Krista Larson (Fermi National Accelerator Lab. (US)) Marco Mascheroni (Fermi National Accelerator Lab. (US)) Vassil Verguilov (Bulgarian Academy of Sciences (BG))

Highlights_403.pdf

Oral_403.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Stability and scalability of the CMS Global Pool: Pushing HTCondor and glideinWMS to new limits

GG C2

San Francisco Mariott Marquis

Speaker

Description

Author

Co-authors

Presentation materials