Exploiting Volatile Opportunistic Computing Resources as a CMS User

Not scheduled
15m
OIST

OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495
poster presentation Track5: Computing activities and Computing models

Speakers

Anna Elizabeth Woodard (University of Notre Dame (US)) Matthias Wolf (University of Notre Dame (US))

Description

Individual scientists in high energy physics experiments like the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider require extensive use of computing resources for analysis of massive data sets. The majority of this analysis work is done at dedicated grid-enabled CMS computing facilities. University campuses offer considerable additional computing resources, but these are not specifically configured to run CMS software. Furthermore, in many cases, the machines are available for general usage whenever they are idle, but opportunistic jobs can be terminated at any time, leading to a highly volatile computing environment. As a joint effort involving computer scientists and CMS physicists at Notre Dame, we have developed an opportunistic workflow management tool, Lobster, to harvest available cycles from university campus computing pools. The Lobster framework consists of a management server, file server, and workers submitted to any available computing resource. Only standard user permissions are required to run the entire suite of processes, making it possible to use this tool with any resource on which the user has permission to run. Lobster makes use of the Work Queue system to perform task management, while the CMS specific software environment is provided via CVMFS and Parrot. Data is handled via Chirp and Hadoop for local data storage and XrootD for access to the CMS wide-area data federation. An extensive set of monitoring and diagnostic tools have been developed to facilitate system optimisation. The tool has been tested in a variety of environments, including within OSG Connect. We have tested it at large-scales using the 20,000-core cluster at Notre Dame, achieving approximately 8000 tasks running simultaneously, sustaining approximately 9 Gbit/s of input data and 340 Mbit/s of output data.

Primary authors

Anna Elizabeth Woodard (University of Notre Dame (US)) Charles Nicholas Mueller (University of Notre Dame (US)) Matthias Wolf (University of Notre Dame (US))

Co-authors

Ben Tovar (University of Notre Dame (US)) Prof. Douglas Thain (University of Notre Dame) Kevin Patrick Lannon (University of Notre Dame (US)) Mike Hildreth (University of Notre Dame (US)) Patrick Donnelly (University of Notre Dame (US)) Paul Brenner (University of Notre Dame (US))

Presentation materials