9–11 May 2007
Manchester, United Kingdom
Europe/Zurich timezone

MOTEUR grid-enabled data-intensive workflow manager

10 May 2007, 14:00
15m
Manchester, United Kingdom

Manchester, United Kingdom

oral presentation Workflow Workflow

Speaker

Dr Johan Montagnat (CNRS)

Report on the experience (or the proposed activity). It would be very important to mention key services which are essential for the success of your activity on the EGEE infrastructure.

MOTEUR is interfaced to different middlewares including gLite.
From the application
side, MOTEUR enacts a workflow of application web services.
Unlike traditional web
services, MOTEUR execution web service interfaces to the gLite
workload Management
System in order to trigger application computations on the grid
infrastructure. At
the higher level, the MOTEUR workflow engines interprets the
workflow description
graph. At run time, this graph combined with input data sets is
used to instantiate
individual computing tasks described through JDLs. The execution
service submits,
monitors and retrieve each task, concurrently whenever possible.
It improves
reliability by supporting jobs resubmissions on failure. MOTEUR
uses a data
provenance history tree to keep track of data transformations and
to ensure coherent
parallel execution. The architecture of MOTEUR is very modular
and the grid interface
can easily be adapted or exchanged without modifying the core
workflow engine.

With a forward look to future evolution, discuss the issues you have encountered (or that you expect) in using the EGEE infrastructure. Wherever possible, point out the experience limitations (both in terms of existing services or missing functionality)

MOTEUR is providing a flexible and high level framework for
enacting scientific
workflows. It enables a service-oriented description of
applications on the
batch-oriented EGEE infrastructure. The computing tasks triggered
by MOTEUR may be of
variable length. the current gLite WMS only provides limited
support for efficient
execution of short tasks that can easily penalize the total
execution time. We have
been successfully using SDJs to drastically improve performances
in some cases though.

Describe the added value of the Grid for the scientific/technical activity you (plan to) do on the Grid. This should include the scale of the activity and of the potential user community and the relevance for other scientific or business applications

Many scientific applications involve multiple computations over
large data sets. In
the medical imaging community for instance, the processing of
large image databases
is needed for large scale studies such as epidemiological or
statistical studies.
MOTEUR is a service-based application enactment engine. It
interfaces to different
application services (Web Services, GridRPC servers) and provide
a generic Web
Service interface to embed non specific code. It benefits from a
SOA design to
provide a flexible and compact application description framework
nicely decoupling
processings from data sets. It is exploiting the Scufl (Taverna's
description
language) iteration strategies to enable the description of
complex data processing
patterns. It transparently exploits different levels of
parallelism and distributes
computations over grid resources to optimize performances. In
particular, it exploits
the large scale data parallelism available in many scientific
data analysis procedures.

Describe the scientific/technical community and the scientific/technical activity using (planning to use) the EGEE infrastructure. A high-level description is needed (neither a detailed specialist report nor a list of references).

MOTEUR (http://egee1.unice.fr/MOTEUR) is a workflow manager
designed to support
data-intensive applications taking advantage of grid resources to
transparently
distribute the computations over a large set of resources.
Currently, MOTEUR is used
within the biomed VO to deploy applications to medical image
analysis. However,
MOTEUR scope is broader and it may be of interest for enacting
many workflow-based
scientific applications on the EGEE infrastructure.

Author

Dr Johan Montagnat (CNRS)

Co-authors

Dr Diane Lingrand (CNRS / I3S) Mr Tristan Glatard (CNRS / INRIA)

Presentation materials