Anna Elizabeth Woodard (University of Notre Dame (US)) Matthias Wolf (University of Notre Dame (US))
Individual scientists in high energy physics experiments like the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider require extensive use of computing resources for analysis of massive data sets. The majority of this analysis work is done at dedicated grid-enabled CMS computing facilities. University campuses offer considerable additional computing resources, but these are not specifically configured to run CMS software. Furthermore, in many cases, the machines are available for general usage whenever they are idle, but opportunistic jobs can be terminated at any time, leading to a highly volatile computing environment. As a joint effort involving computer scientists and CMS physicists at Notre Dame, we have developed an opportunistic workflow management tool, Lobster, to harvest available cycles from university campus computing pools. The Lobster framework consists of a management server, file server, and workers submitted to any available computing resource. Only standard user permissions are required to run the entire suite of processes, making it possible to use this tool with any resource on which the user has permission to run. Lobster makes use of the Work Queue system to perform task management, while the CMS specific software environment is provided via CVMFS and Parrot. Data is handled via Chirp and Hadoop for local data storage and XrootD for access to the CMS wide-area data federation. An extensive set of monitoring and diagnostic tools have been developed to facilitate system optimisation. The tool has been tested in a variety of environments, including within OSG Connect. We have tested it at large-scales using the 20,000-core cluster at Notre Dame, achieving approximately 8000 tasks running simultaneously, sustaining approximately 9 Gbit/s of input data and 340 Mbit/s of output data.