21–25 May 2012
New York City, NY, USA
US/Eastern timezone

MPI support in the DIRAC Pilot Job Workload Management System

22 May 2012, 13:30
4h 45m
Rosenthal Pavilion (10th floor) (Kimmel Center)

Rosenthal Pavilion (10th floor)

Kimmel Center

Poster Distributed Processing and Analysis on Grids and Clouds (track 3) Poster Session

Speaker

Ms Vanessa Hamar (CPPM-IN2P3-CNRS)

Description

Parallel job execution in the grid environment using MPI technology presents a number of challenges for the sites providing this support. Multiple flavors of the MPI libraries, shared working directories required by certain applications, special settings for the batch systems make the MPI support difficult for the site managers. On the other hand the workload management systems with pilot jobs became ubiquitous although the support for the MPI applications in the pilot frameworks was not available. This support was recently added in the DIRAC Project in the context of the GISELA Latin American Grid. Special services for dynamic allocation of virtual computer pools on the grid sites were developed in order to deploy MPI rings corresponding to the requirements of the jobs in the central task queue of the DIRAC Workload Management systems. The required MPI software is installed automatically by the pilot agents using user space file system techniques. The same technique is used to emulate shared working directories for the parallel MPI processes. This makes it possible to execute MPI jobs even on the sites not supporting them officially. Reusing so constructed MPI rings for execution of a series of parallel jobs increases dramatically their efficiency and turnaround. In this contribution we will describe the design and implementation of the DIRAC MPI Service as well as its support for various types of the MPI libraries. Advantages of coupling the MPI support with the pilot frameworks will be outlined and examples of usage with real applications will be presented.

Primary authors

Dr Andrei Tsaregorodtsev (Universite d'Aix - Marseille II (FR)) Ms Vanessa Hamar (CPPM-IN2P3-CNRS)

Presentation materials