14–18 Oct 2013
Amsterdam, Beurs van Berlage
Europe/Amsterdam timezone

Minimizing draining waste through extending the lifetime of pilot jobs in Grid environments

17 Oct 2013, 12:06
22m
Keurzaal (Amsterdam, Beurs van Berlage)

Keurzaal

Amsterdam, Beurs van Berlage

Oral presentation to parallel session Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models Distributed Processing and Data Handling B: Experiment Data Processing, Data Handling and Computing Models

Speaker

Igor Sfiligoi (University of California San Diego)

Description

The computing landscape is moving at an accelerated pace to many-core computing. Nowadays, it is not unusual to get 32 cores on a single physical node. As a consequence, there is increased pressure in the pilot systems domain to move from purely single-core scheduling and allow multi-core jobs as well. In order to allow for a gradual transition from single-core to multi-core user jobs, it is envisioned that pilot jobs will have to handle both kinds of user jobs at the same time, by requesting several cores at a time from Grid providers and then partitioning them between the user jobs at runtime. Unfortunately, the current Grid ecosystem only allows for relatively short lifetime of pilot jobs, requiring frequent draining, with the relative waste of compute resources due to varying lifetimes of the user jobs. Significantly extending the lifetime of pilot jobs is thus highly desirable, but must come without any adverse effects for the Grid resource providers. In this paper we present a mechanism, based on communication between the pilot jobs and the Grid provider, that allows for pilot jobs to run for extended periods of time when there are available resources, but also allows the Grid provider to reclaim the resources in a short amount of time when needed. We also present the experience of running a prototype system using the above mechanism on a couple US-based Grid sites.

Primary authors

Brian Paul Bockelman (University of Nebraska Lincoln) Frank Wurthwein (University of California San Diego) Igor Sfiligoi (University of California San Diego) Terrence Martin (University of California San Diego)

Presentation materials