1. Short overview
This year, the IV Grid Plugtests took place in Beijing, China from October the 29th to November the 1st, 2007. Organized by ETSI and INRIA, it proposed a contest on the N-Queens problem in order to test grid technologies.
We offer a feed-back about our experience of running efficiently our N-Queens application on a whole computing grid like Grid'5000, composing tools from reservation and deployment to tasks scheduling.
To run our N-Queens application on the grid, we composed three tools : ProActive, TakTuk and Kaapi. The grid's architecture was provided by Plugtests organizers through a deployment descriptor file which contains required information to reserve and contact nodes (gateways, resources managers). ProActive was in charge of reserving all the nodes and creating a tunnel to each cluster of the grid. Then Taktuk just used these tunnels to connect all the nodes of all the clusters and started the Kaapi processes.
Our N-Queens application ran successfully during this Plugtests. We deployed our Kaapi processes on 1364 nodes of Grid5000 (one process by node) in less than 3 minutes. The computation used 3654 cores (each Kaapi process creates one computation thread by core). Using this deployment during the one-hour slot, we computed all the solutions of one 23-Queens (35min 7s) and of six 22-Queens (about 2min 21s each). These results gave us the first place of the contest.
4. Conclusions / Future plans
We learnt two main lessons from these experiences:
- Kaapi middleware allows us to scale up to thousands of heterogeneous cores while the efficiency is preserved. On going work is to increase the scalability on highly heterogeneous networks.
- Fault tolerance is essential to run application at such a scale. Many times during the contest, our application crashed because some nodes in the grid failed. Two fault tolerance protocols are currently in development for Kaapi.
Provide a set of generic keywords that define your contribution (e.g. Data Management, Workflows, High Energy Physics)
Grid, Deployment, Work-stealing scheduling, Tools for the grids
URL for further information: