IPGP, Institut de Physique du Globe de Paris, is one of the main research and educational institution in the domain of Geosciences in France and is in charge of several national observatories and of the volcanic hazard monitoring of the French volcanos in the Antillas and the Reunion Island.
Research and monitoring activities of IPGP involve the development of new data analysis and simulation methods. Some of its most important applications are parallelized with MPI and Fortran 90, and run on local resources or national computing centers.
IPGP has been testing MPI on Grid in the last years, first on DataGrid and then on EGEE to evaluate the potential of production GRID technologies as complementary resources.
After an introduction to Message Passing, this presentation will speak about the problems encountered working with MPI on EGEE and will then present the functionalities required to execute other applications.
Since the beginning of MPI on the Grid, important improvements have been achieved thanks to the work of the TCG WG on MPI that publishes guidelines for sites and users. When these recommendations are followed, users can work with MPI without having to know about details of the local installations. Unfortunately, too few sites are implementing MPI as well as these recommendations. Moreover the supplied MPI-START package is not actually very user friendly to configure.
Two major IPGP applications, SPECFEM3D and SEMUM3D, have been ported on EGEE. For the latter, in July 2009, 26 CEs have satisfied the requirements (ESR, MPICH, shared homes) but it ran only on 7 of them, sometimes after several attempts. Thankfully, it always worked on the CEs of the Trinity College Dublin (IE), also in EELA, and all IN2P3 (FR).
Furthermore, other functionalities have to be implemented and documented in order to port more MPI applications on EGEE: possibility to specify memory per process (to avoid lack of memory problem), maximum processes per node (to avoid network saturation) for example, availability of high performance networks (Myrinet, Infiniband). These topics are under consideration within the MPI WG but not yet implemented.
EGEE can't actually be used today as a production tool for MPI applications because very few sites satisfy the basic requirements and very long restitution time has to be expected even when requiring few cores.