CHEP 07

Name: CHEP 07
Start: 2007-09-02T08:00:00+02:00
End: 2007-09-09T12:00:00+02:00
Location: Victoria, Canada

2–9 Sept 2007

Victoria, Canada

Europe/Zurich timezone

Please book accomodation as soon as possible.

Support

chep07-support@triumf.ca

glideinWMS - A generic pilot-based Workload Management System

4 Sept 2007, 11:40

20m

Carson Hall C (Victoria, Canada)

Carson Hall C

Victoria, Canada

oral presentation Grid middleware and tools Grid middleware and tools

Mr Igor Sfiligoi (FNAL)

Grids are making it possible for Virtual Organizations (VOs) to run hundreds of thousands of jobs per day. However, the resources are distributed among hundreds of independent Grid sites. A higer level Workload Management System (WMS) is thus necessary. glideinWMS is a pilot-based WMS, inheriting several useful features: 1) Late binding: Pilots are sent to all suitable Grid sites. Only once pilots start are real jobs selected for that resources. No forecasting is needed. 2) Reliability: A broken Grid site will either kill pilot jobs or pilots will detect the problem at startup. Real jobs only start on well-behaved resources. 3) Grid-wide fair share: The relative priorities between jobs of the same VO are set inside the WMS. Grid sites only manage priorities between different VOs. glideinWMS is based on the Condor glidein concept, i.e. a regular Condor pool, with the Condor daemons (startd) being started by pilot jobs. The real jobs are vanilla, standard or MPI universe jobs. glideinWMS is composed of Glidein Factories and VO Frontends, communicating using Condor ClassAds: * Factories publish the available Grid sites, * Frontends match the Grid attributes to job attributes and publish a request for a stream of glideins to suitable Grid sites * Factories pick up the requests and submit the glideins A detailed description of the system will be presented, along with the currently deployed systems inside USCMS production and user analysis frameworks. Integration with frameworks of other VOs will also be presented, as well as the measured scalability limits.

Mr Igor Sfiligoi (FNAL)

Slides

glideinWMS_talk_rc4.odp

glideinWMS_talk_rc4.pdf

CHEP 07

Support

glideinWMS - A generic pilot-based Workload Management System

Carson Hall C

Victoria, Canada

Speaker

Description

Author

Presentation materials