BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CERN//INDICO//EN
BEGIN:VEVENT
SUMMARY:MOTEUR: a data intensive service-based workflow engine enactor
DTSTART;VALUE=DATE-TIME:20060302T143000Z
DTEND;VALUE=DATE-TIME:20060302T150000Z
DTSTAMP;VALUE=DATE-TIME:20130523T110235Z
UID:indico-contribution-33@cern.ch
DESCRIPTION:Speakers: GLATARD\, Tristan (CNRS)\n** Managing data-intensive
  application workflows\n\nMany data analysis procedures implemented on gri
 ds are not only\nbased on a single processing algorithm but rather assembl
 ed from a set\nof basic tools dedicated to process the data\, model it\, e
 xtract\nquantitative information\, analyze results\, etc. Given that\ninte
 roperable algorithms packed in software components with a\nstandardized in
 terface enabling data exchanges are provided\, it is\npossible to build co
 mplex workflows to represent such procedures for\ndata analysis. High leve
 l tools for expressing and handling the\ncomputation flow are therefore ex
 pected to ease computerized medical\nexperiments development.\n\nWorkflow 
 processing is a thoroughly researched area. Grid enabled\napplication ofte
 n need to process large datasets made of e.g.\nhundreds or thousand of dat
 a to be processed according to a same\nworkflow pattern. We are therefore 
 proposing a workflow enactment\nengine which:\n- Makes the description of 
 the application workflow simple from the\n  application developer point of
  view.\n- Enables the execution of legacy code.\n- Optimizes the performan
 ces of data-intensive applications by exploiting\n  the potential parallel
 ism of the grid infrastructure.\n\n** MOTEUR: an optimized service-based w
 orkflow engine\n\nMOTEUR stands for hoMe-made OpTimisEd scUfl enactoR. MOT
 EUR is written\nin Java and available under CeCILL Public License (a GPL-c
 ompatible\nopen source license) at http://www.i3s.unice.fr/~glatard. \nThe
  workflow description language adopted is the Simple Concept\nUnified Flow
  Language (Scufl) used by the Taverna and that is\ncurrently becoming a st
 andard in the e-Science community.\n\nFigure 1 shows the MOTEUR web interf
 ace representing\na workflow that is being executed. Each service is repre
 sented by a\ncolor box and data links are represented by curves. The servi
 ces are\ncolor coded depending on their current status: gray services have
 \nnever been executed\; green services are running\; blue services have\nf
 inished the execution of all input data available\; and yellow\nservices a
 re not currently running but waiting for input data to\nbecome available.\
 n\nMOTEUR is interfaced to the job submission interfaces of both the EGEE\
 ninfrastructure and the Grid5000 experimental grid. In addition\,\nlightwe
 ight jobs execution can be orchestrated on local\nresources. MOTEUR is abl
 e to submit different computing tasks on\ndifferent infrastructures during
  a single workflow execution. MOTEUR\nis implementing an interface to both
  Web Services and GridRPC\napplication services.\n\nBy opposition to the t
 ask-based approach implemented in DAGMan\, MOTEUR\nis service-based. The s
 ervices paradigm has been widely adopted by\nmiddleware developers for the
  high level of flexibility that it\noffers. Application services are simil
 arly well suited for composing\ncomplex applications from basic processing
  algorithms. In addition\, the\nindependent description of application ser
 vices and the data to be\nprocessed make this paradigm very efficient for 
 processing large data\nsets. However\, this approach is less common for ap
 plication code as it\nrequires all codes to be instrumented with the commo
 n service\ninterface.\n\nTo ease the use of legacy code\, a generic wrappe
 r application service\nhas been developed. This grid submission service is
  exposing a\nstandard web interface and is controlling the submission of a
 ny\nexecutable code. It releases the user from the need to write a\nspecif
 ic service interface and recompile its application code. Only a\nsmall exe
 cutable invocation description file is required to enable the\ncommand lin
 e composition by the generic wrapper.\n\nTo enact different data-intensive
  applications\, MOTEUR implements two\ndata composition patterns. The data
  sets transmitted to a service can\nbe composed pairwise (each input of th
 e first input data set is\nprocessed with each input of the second one). T
 his correspond to the\ncase where the two input data sets are semantically
  connected. The\ndata sets can also be fully composed (all inputs of the f
 irst set are\nprocessed with all inputs of the second one). The use of the
 se two\ncomposition strategies significantly enlarges the expressiveness o
 f\nthe workflow language. It is a powerful tool for expressing complex\nda
 ta-intensive processing applications in a very compact format.\n\nFinally 
 MOTEUR enables 3 different levels of parallelism for\noptimizing workflow 
 application code execution:\n- workflow parallelism inherent to the workfl
 ow topology\;\n- data parallelism: different input data can be processed i
 ndependently in\n  parallel\;\n- services parallelism: different services 
 processing different data are\n  independent and can be executed in parall
 el.\nTo our knowledge\, MOTEUR is the first service-based workflow enactor
 \nimplementing all these optimizations.\n\n** Performance analysis on an i
 mage registration assessment application\n\nMedical image registration alg
 orithms are playing a key role in a very\nlarge number of medical image an
 alysis procedures. They are\nfundamental processings often needed prior to
  any subsequent\nanalysis. The Bronze Standard application\n(http://egee-n
 a4.ct.infn.it/biomed/BronzeStandard.html) \nis a statistical procedure aim
 ing at assessing the precision and\naccuracy of different registration alg
 orithms. The complex application\nworkflow is illustrated in figure 1. Thi
 s\ndata-intensive application requires the processing of as much input\nim
 age pairs as possible to extract relevant statistics.\n\nThe Bronze Standa
 rd application has been enacted on the EGEE\ninfrastructure through the MO
 TEUR workflow execution engine. A 126\nimage pairs data base\, courtesy of
  Dr Pierre-Yves Bondiau (cancer\ntreatment center "Antoine Lacassagne"\, N
 ice\, France)\, was used for\nthe computations. In total\, the workflow ex
 ecution resulted in 756\njob submissions. The different levels of optimiza
 tion implemented in\nMOTEUR permitted a speed-up higher than 9.1 when comp
 ared to a naive\nexecution of the workflow.\n\nSuch data intensive applica
 tions are common in the medical image\nanalysis community and there is an 
 increasing need for compute\ninfrastructure capable of efficiently process
 ing large image\ndatabases. MOTEUR is a generic workflow engine that was d
 esigned to\nefficiently process data intensive workflows. It is freely ava
 ilable\nfor download under a GPL-like license.\n\nhttp://indico.cern.ch/co
 ntributionDisplay.py?contribId=33&sessionId=15&confId=286
LOCATION:CERN 40-SS-C01
URL:http://indico.cern.ch/contributionDisplay.py?contribId=33&sessionId=15
 &confId=286
END:VEVENT
END:VCALENDAR
