EGEE User Forum

Europe/Zurich
CERN

CERN

Description

The EGEE (Enabling Grids for E-sciencE) project provides the largest production grid infrastructure for applications. In the first two years of the project an increasing number of diverse users communities have been attracted by the possibilities offered by EGEE and have joined the initial user communities. The EGEE user community feels it is now appropriate to meet to share their experiences, and to set new targets for the future, including both the evolution of the existing applications and the development and deployment of new applications onto the EGEE infrastructure.

The EGEE Users Forum will provide an important opportunity for innovative applications to establish contacts with EGEE and with other user communities, to plan for the future usage of the EGEE grid infrastructure, to learn about the latest advances, and to discuss the future evolution in the grid middleware. The main goal is to create a dynamic user community, starting from the base of existing users, which can increase the effectiveness of the current EGEE applications and promote the fast and efficient uptake of grid technology by new disciplines. EGEE fosters pioneering usage of its infrastructure by encouraging collaboration between diverse scientific disciplines. It does this to evolve and to expand the services offered to the EGEE user community, maximising the scientific, technological and economical relevance of grid-based activities.

We would like to invite hands-on users of the EGEE Grid Infrastructure to Submit an Abstract for this event following the suggested template.

EGEE User Forum Web Page
Participants
  • Adrian Vataman
  • Alastair Duncan
  • Alberto Falzone
  • Alberto Ribon
  • Ales Krenek
  • Alessandro Comunian
  • Alexandru Tudose
  • Alexey Poyda
  • Algimantas Juozapavicius
  • Alistair Mills
  • Alvaro del Castillo San Felix
  • Andrea Barisani
  • Andrea Caltroni
  • Andrea Ferraro
  • Andrea Manzi
  • Andrea Rodolico
  • Andrea Sciabà
  • Andreas Gisel
  • Andreas-Joachim Peters
  • Andrew Maier
  • Andrey Kiryanov
  • Aneta Karaivanova
  • Antonio Almeida
  • Antonio De la Fuente
  • Antonio Laganà
  • Antony wilson
  • Arnaud PIERSON
  • Arnold Meijster
  • Benjamin Gaidioz
  • Beppe Ugolotti
  • Birger Koblitz
  • Bjorn Engsig
  • Bob Jones
  • Boon Low
  • Catalin Cirstoiu
  • Cecile Germain-Renaud
  • Charles Loomis
  • CHOLLET Frédérique
  • Christian Saguez
  • Christoph Langguth
  • Christophe Blanchet
  • Christophe Pera
  • Claudio Arlandini
  • Claudio Grandi
  • Claudio Vella
  • Claudio Vuerli
  • Claus Jacobs
  • Craig Munro
  • Cristian Dittamo
  • Cyril L'Orphelin
  • Daniel JOUVENOT
  • Daniel Lagrava
  • Daniel Rodrigues
  • David Colling
  • David Fergusson
  • David Horn
  • David Smith
  • David Weissenbach
  • Davide Bernardini
  • Dezso Horvath
  • Dieter Kranzlmüller
  • Dietrich Liko
  • Dmitry Mishin
  • Doina Banciu
  • Domenico Vicinanza
  • Dominique Hausser
  • Eike Jessen
  • Elena Slabospitskaya
  • Elena Tikhonenko
  • Elisabetta Ronchieri
  • Emanouil Atanassov
  • Eric Yen
  • Erwin Laure
  • Esther Acción García
  • Ezio Corso
  • Fabrice Bellet
  • Fabrizio Pacini
  • Federica Fanzago
  • Fernando Felix-Redondo
  • Flavia Donno
  • Florian Urmetzer
  • Florida Estrella
  • Fokke Dijkstra
  • Fotis Georgatos
  • Fotis Karayannis
  • Francesco Giacomini
  • Francisco Casatejón
  • Frank Harris
  • Frederic Hemmer
  • Gael youinou
  • Gaetano Maron
  • Gavin McCance
  • Gergely Sipos
  • Giorgio Maggi
  • Giorgio Pauletto
  • giovanna stancanelli
  • Giuliano Pelfer
  • Giuliano Taffoni
  • Giuseppe Andronico
  • Giuseppe Codispoti
  • Hannah Cumming
  • Hannelore Hammerle
  • Hans Gankema
  • Harald Kornmayer
  • Horst Schwichtenberg
  • Huard Helene
  • Hugues BENOIT-CATTIN
  • Hurng-Chun LEE
  • Ian Bird
  • Ignacio Blanquer
  • Ilyin Slava
  • Iosif Legrand
  • Isabel Campos Plasencia
  • Isabelle Magnin
  • Jacq Florence
  • Jakub Moscicki
  • Jan Kmunicek
  • Jan Svec
  • Jaouher KERROU
  • Jean Salzemann
  • Jean-Pierre Prost
  • Jeremy Coles
  • Jiri Kosina
  • Joachim Biercamp
  • Johan Montagnat
  • John Walk
  • John White
  • Jose Antonio Coarasa Perez
  • José Luis Vazquez
  • Juha Herrala
  • Julia Andreeva
  • Kerstin Ronneberger
  • Kiril Boyanov
  • Kiril Boyanov
  • Konstantin Skaburskas
  • Ladislav Hluchy
  • Laura Cristiana Voicu
  • Laura Perini
  • Leonardo Arteconi
  • Livia Torterolo
  • Losilla Guillermo Anadon
  • Luciano Milanesi
  • Ludek Matyska
  • Lukasz Skital
  • Luke Dickens
  • Malcolm Atkinson
  • Marc Rodriguez Espadamala
  • Marc-Elian Bégin
  • Marcel Kunze
  • Marcin Plociennik
  • Marco Cecchi
  • Mariusz Sterzel
  • Marko Krznaric
  • Markus Schulz
  • Martin Antony Walker
  • Massimo Lamanna
  • Massimo Marino
  • Miguel Cárdenas Montes
  • Mike Mineter
  • Mikhail Zhizhin
  • Mircea Nicolae Tugulea
  • Monique Petitdidier
  • Muriel Gougerot
  • Nadezda Fialko
  • Nadine Neyroud
  • Nick Brook
  • Nicolas Jacq
  • Nicolas Ray
  • Nils Buss
  • Nuno Santos
  • Osvaldo Gervasi
  • Othmane Bouhali
  • Owen Appleton
  • Pablo Saiz
  • Panagiotis Louridas
  • Pasquale Pagano
  • Patricia Mendez Lorenzo
  • Pawel Wolniewicz
  • Pedro Andrade
  • Peter Kacsuk
  • Peter Praxmarer
  • Philippa Strange
  • Philippe Renard
  • Pier Giovanni Pelfer
  • Pietro Lio
  • Pietro Liò
  • Rafael Leiva
  • Remi Mollon
  • Ricardo Brito da Rocha
  • Riccardo di Meo
  • Robert Cohen
  • Roberta Faggian Marque
  • Roberto Barbera
  • Roberto Santinelli
  • Rolandas Naujikas
  • Rolf Kubli
  • Rolf Rumler
  • Romier Genevieve
  • Rosanna Catania
  • Sabine ELLES
  • Sandor Suhai
  • Sergio Andreozzi
  • Sergio Fantinel
  • Shkelzen RUGOVAC
  • Silvano Paoli
  • Simon Lin
  • Simone Campana
  • Soha Maad
  • Stefano Beco
  • Stefano Cozzini
  • Stella Shen
  • Stephan Kindermann
  • Steve Fisher
  • tao-sheng CHEN
  • Texier Romain
  • Toan Nguyen
  • Todor Gurov
  • Tomasz Szepieniec
  • Tony Calanducci
  • Torsten Antoni
  • tristan glatard
  • Valentin Vidic
  • Valerio Venturi
  • Vangelis Floros
  • Vaso Kotroni
  • Venicio Duic
  • Vicente Hernandez
  • Victor Lakhno
  • Viet Tran
  • Vincent Breton
  • Vincent LEFORT
  • Vladimir Voznesensky
  • Wei-Long Ueng
  • Ying-Ta Wu
  • Yury Ryabov
  • Ákos Frohner
    • 13:00 14:00
      Lunch 1h
    • 12:30 14:00
      Lunch 1h 30m
    • 14:00 18:30
      2a: Workload management and Workflows 40-SS-C01

      40-SS-C01

      CERN

      • 14:00
        Logging and Bookkeeping and Job Provenance services 30m
        Logging and Bookkeeping (LB) service is responsible for keeping track of jobs within a complex Grid environment. Without such a service, users are unable to find out what happened with their lost jobs and Grid administrators are not able to improve the infrastructure. The LB service developed within the EGEE project provides a distributed scalable solution able to deal with hundreds thousands of jobs on large Grids. However, to provide the necessary scalability and not to slow down the processing of jobs within a middleware, it is based on a non-blocking asynchronous model. This means that the order of events sent to LB by individual parts of the middleware (user interface, scheduler, computing element, ...) is not guaranteed. While dealing with such out of order events, the LB may provide information that looks inconsistent with the knowledge user has from some other source (e.g. he got independent notification about the job state). The lecture will reveal LB internal design and we will discuss how the LB results (i.e. the job state) should be interpreted. While LB is dealing with active jobs only, Job Provenance (JP) is designed to store indefinitely information about all jobs that run on a Grid. All the relevant information needed to re-submit the job in the same environment is stored, including computing environment specification. Users can annotate stored records, providing yet another metadata layer useful e.g. for job grouping and data mining over the JP. We will provide basic information about the JP and its use, looking for a feedback for its improvement.
        Speaker: Prof. Ludek Matyska (CESNET, z.s.p.o.)
        Slides
      • 14:30
        The gLite Workload Management System 30m
        The Workload Management System (WMS) is a collection of components providing a service responsible for the distribution and management of tasks across resources available on a Grid, in such a way that applications are conveniently, efficiently and effectively executed. The main purpose of the WMS as a whole is then to accept a request of execution of a job from a client, find appropriate resources to satisfy it and follow it until completion, possibly rescheduling it, totally or in part, if an infrastructure failure occurs. A job is always associated to the credentials of the user who submitted it. All the operations performed by the WMS in order to complete the job are done on behalf of the owning user. A mechanism exists to renew credentials automatically and safely for long-running jobs. The different aspects of job management are accomplished by different WMS components, usually implemented as different processes communicating via data structures persistently stored on disk to avoid as much as possible data losses in case of failure. Recent releases of the WMS come with a Web Service interface that has replaced the custom interface previously adopted. Moving to formal or de-facto standards will continue in the future. In order to track a job during its lifetime, relevant events (such as submission, resource matching, running, completion) are gathered from various WMS components as well as from Grid resources (typically Computing Elements), which are properly instrumented. Events are kept persistently by the Logging and Bookkeeping Service (LB) and indexed by a unique, URL-like job identifier. The LB offers also a query interface both for the logged raw events and for higher-level task state. Multiple LBs may exist, but a job is statically assigned to one of them. Being the LB designed, implemented and deployed so that the service is highly reliable and available, the WMS heavily relies on it as the authoritative source for job information. The types of job currently supported by the WMS are diverse: batch-like, simple workflow in the form of Directed Acyclic Graphs (DAGs), collection, parametric, interactive, MPI, partitionable, checkpointable. The characteristics of a job are expressed using a flexible language called Job Description Language (JDL). The JDL also allows the specification of constraints and preferences on the resources that can be used to execute the job. Moreover some attributes exist that are useful for the management of the job itself, for example how much to insist with a job in case of repeated failures or lack of resources. Of the above job types, the parametric jobs, the collections, and the workflows have recently received special attention. A parametric job allows the submission of a large number of almost identical jobs simply specifying a parameterized description and the list of values for the parameter. A collection allows the submission of a number of jobs as a single entity. An interesting feature in this case is the possibility to specify a shared input sandbox. The input sandbox is a group of files that the user wishes to be available on the computer where the job runs. Sharing a sandbox allows some significant optimization in network traffic and, for example, can greatly reduce the submission time. Support for workflows in the gLite WMS is currently limited to Directed Acyclic Graphs (DAGs), consisting of a set of jobs and a set of dependencies between them. Dependencies represent time constraints: a child cannot start before all parents have successfully completed. In general jobs are independently scheduled and the choice of the computing resource where to execute a job is done as late as possible. A recently added feature allows to collocate the jobs on the same resource. Future improvements will mainly concern error handling and integration with data management. Parametric jobs, collections and workflows have their own job identifier, so that all the jobs belonging to them can be controlled either independently or as a single entity. Future developments of the WMS will follow three main lines: stronger integration with other services, software cleanup, and scalability. The WMS already interacts with many external services, such as Logging and Bookkeeping, Computing Elements, Storage Elements, Service Discovery, Information System, Replica Catalog, Virtual Organization Membership Service (VOMS). Integration with a policy engine (G-PBox) and an accounting system (DGAS) is progressing; this will ease the enforcement of local and global policies regulating the execution of tasks over the Grid, giving fine control on how the available resources can be used. Designing and implementing a WMS that relies on external services for the above functionality is certainly more difficult than providing a monolithic system, but in fact doing so favors a generic solution that is not application specific and can be deployed in a variety of environments. The cleanup will affect not only the existing code base, but will also aim at improving the software usability and at simplifying service deployment and management. This effort will require the evaluation and possibly the re-organization of the current components, yet keeping the interface. Last but not least, considerable effort needs to be spent on the scalability of the service. The functionality currently offered already allows many kinds of applications to port their computing model onto the Grid. But additionally some of those applications have demanding requirements on the amount of resources, such as computing, storage, network, and data, they need to access in order to accomplish their goal. The WMS is already designed and implemented to operate in an environment with multiple running instances not communicating with each other and seeing the same resources. This certainly helps in case the available WMSs get overloaded: it is almost as simple as starting another instance. Unfortunately this approach cannot be extended much further because it would cause too much contention on the available resources. Hence the short term objective is to make a single WMS instance able to manage 100000 jobs per day. In the longer term it will be possible to deploy a cluster of instances sharing the same state.
        Speaker: Francesco Giacomini (Istituto Nazionale di Fisica Nucleare (INFN))
        Slides
      • 15:00
        BOSS: the CMS interface for job summission, monitoring and bookkeeping 30m
        BOSS (Batch Object Submission System) has been developed in the context of the CMS experiment to provide logging and bookkeeping and real-time monitoring of jobs submitted to a local farm or a grid system. The information is persistently stored in a relational database (right now MySQL or SQLite) for further processing. In this way the information that was available in the log file in a free form is structured in a fixed-form that allows easy and efficient access. The database is local to the user environment and is not requested to provide server capabilities to the external world: the only component that interacts with it is the BOSS client process. BOSS can log not only the typical information provided by the batch systems (e.g. executable name, time of submission and execution, return status, etc…), but also information specific to the job that is being executed (e.g. dataset that is being produced or analyzed, number of events done so far, number of events to be done, etc…). This is done by means of user-supplied filters: BOSS extracts the specific user-program information to be logged from the standard streams of the job itself filling up a fixed form journal file to be retrieved and processed at the end of job running via the BOSS client process. BOSS interfaces to a local or grid scheduler (e.g. LSF, PBS, Condor, LCG, etc…) through a set of scripts provided by the system administrator, using a predefined syntax. This allow hiding to the upper layers its implementation details, in particular whether the batch system is local or distributed. The interface provides the capability to register, un-register and list the schedulers. BOSS provides an interface to the local scheduler for the operations of job submission, deletion, querying and output retrieval. At output retrieval time the information in the database is updated using information sent back with the job. BOSS provides also an optional run-time monitoring system that, working in parallel to the logging system, collects information while the computational program is still running, and presents it to the upper layers through the same interface. The real-time information sent by the running jobs are collected in a separate database server, the same real-time database server may support more than one BOSS database. The information in the real-time database server has a limited lifetime: in general it is deleted after that the user has accessed it, and in any case after successful retrieval of the journal file. It is not possible to use the information in the real-time database server to update the logging information in the BOSS database once the journal file for the related job has been processed. The run-time monitoring is made through a pair client-updater registered as a plug-in module: they are the only components that interact with the real time database. The real-time updater is a client of the real-time database server: it sends the information of the journal file to the server at pre-defined intervals of time. The real-time client is a tool used by BOSS to update his database using the real-time information. The interface with the user is made through: a command line , kept as similar as possible to the one of the previous versions; it is the minimal way to access BOSS functionalities to give a straightforward test and training instrument; C++ API, increasing functionalities and ease-to-use for programs using BOSS: currently it is under development and is meant to grown-up with the users requirements; Python API, giving almost the same functionalities of the C++ one, plus the possibility to run BOSS from a python command line. User programs may be chained together to be executed by a single batch unit (job). The relational structure supports not only multiple programs per job (program chains) but also multiple jobs per chain (in the event of job resubmission). Homogeneous jobs, or better "chains of programs", may be grouped together in tasks (e.g. as a consequence of the splitting of a single processing chain into many processing chains that may run in parallel). The description of a task is passed to BOSS through an XML file, since it can model its hierarchical structure in a natural way. The process submitted to the batch scheduler is the BOSS job wrapper. All interactions of the batch scheduler to the user process pass through the BOSS wrapper. The BOSS job wrapper starts the chosen chaining tool, and optionally the real-time updater. An internal tool for chaining programs linearly is implemented in BOSS but in future external chaining tools may be registered to BOSS so that more complex chaining rules may be requested by the users. BOSS will not need to know how they work and will just pass any configuration information transparently down to them. The chaining tool starts a BOSS “program wrapper” for each user program.The program wrapper starts all processes needed to get the run-time information from the user programs into the journal file. This program wrapper is unique and it has to be started passing only one parameter, the program id. The BOSS client determines finished jobs by a query to the scheduler. It retrieves the output for those jobs and uses the information in the journal file to update the BOSS database. The BOSS client pops the information about running jobs from the real-time database server through the client part of the registered Real Time Monitor. It also deletes from the server the information concerning jobs for which the BOSS database has already been updated using the journal file. The information extracted from the real-time database server may be used to update the local BOSS database or just to show the latest status to the user.
        Speaker: Giuseppe Codispoti (Universita di Bologna)
      • 15:30
        MOTEUR: a data intensive service-based workflow engine enactor 30m
        ** Managing data-intensive application workflows Many data analysis procedures implemented on grids are not only based on a single processing algorithm but rather assembled from a set of basic tools dedicated to process the data, model it, extract quantitative information, analyze results, etc. Given that interoperable algorithms packed in software components with a standardized interface enabling data exchanges are provided, it is possible to build complex workflows to represent such procedures for data analysis. High level tools for expressing and handling the computation flow are therefore expected to ease computerized medical experiments development. Workflow processing is a thoroughly researched area. Grid enabled application often need to process large datasets made of e.g. hundreds or thousand of data to be processed according to a same workflow pattern. We are therefore proposing a workflow enactment engine which: - Makes the description of the application workflow simple from the application developer point of view. - Enables the execution of legacy code. - Optimizes the performances of data-intensive applications by exploiting the potential parallelism of the grid infrastructure. ** MOTEUR: an optimized service-based workflow engine MOTEUR stands for hoMe-made OpTimisEd scUfl enactoR. MOTEUR is written in Java and available under CeCILL Public License (a GPL-compatible open source license) at http://www.i3s.unice.fr/~glatard. The workflow description language adopted is the Simple Concept Unified Flow Language (Scufl) used by the Taverna and that is currently becoming a standard in the e-Science community. Figure 1 shows the MOTEUR web interface representing a workflow that is being executed. Each service is represented by a color box and data links are represented by curves. The services are color coded depending on their current status: gray services have never been executed; green services are running; blue services have finished the execution of all input data available; and yellow services are not currently running but waiting for input data to become available. MOTEUR is interfaced to the job submission interfaces of both the EGEE infrastructure and the Grid5000 experimental grid. In addition, lightweight jobs execution can be orchestrated on local resources. MOTEUR is able to submit different computing tasks on different infrastructures during a single workflow execution. MOTEUR is implementing an interface to both Web Services and GridRPC application services. By opposition to the task-based approach implemented in DAGMan, MOTEUR is service-based. The services paradigm has been widely adopted by middleware developers for the high level of flexibility that it offers. Application services are similarly well suited for composing complex applications from basic processing algorithms. In addition, the independent description of application services and the data to be processed make this paradigm very efficient for processing large data sets. However, this approach is less common for application code as it requires all codes to be instrumented with the common service interface. To ease the use of legacy code, a generic wrapper application service has been developed. This grid submission service is exposing a standard web interface and is controlling the submission of any executable code. It releases the user from the need to write a specific service interface and recompile its application code. Only a small executable invocation description file is required to enable the command line composition by the generic wrapper. To enact different data-intensive applications, MOTEUR implements two data composition patterns. The data sets transmitted to a service can be composed pairwise (each input of the first input data set is processed with each input of the second one). This correspond to the case where the two input data sets are semantically connected. The data sets can also be fully composed (all inputs of the first set are processed with all inputs of the second one). The use of these two composition strategies significantly enlarges the expressiveness of the workflow language. It is a powerful tool for expressing complex data-intensive processing applications in a very compact format. Finally MOTEUR enables 3 different levels of parallelism for optimizing workflow application code execution: - workflow parallelism inherent to the workflow topology; - data parallelism: different input data can be processed independently in parallel; - services parallelism: different services processing different data are independent and can be executed in parallel. To our knowledge, MOTEUR is the first service-based workflow enactor implementing all these optimizations. ** Performance analysis on an image registration assessment application Medical image registration algorithms are playing a key role in a very large number of medical image analysis procedures. They are fundamental processings often needed prior to any subsequent analysis. The Bronze Standard application (http://egee-na4.ct.infn.it/biomed/BronzeStandard.html) is a statistical procedure aiming at assessing the precision and accuracy of different registration algorithms. The complex application workflow is illustrated in figure 1. This data-intensive application requires the processing of as much input image pairs as possible to extract relevant statistics. The Bronze Standard application has been enacted on the EGEE infrastructure through the MOTEUR workflow execution engine. A 126 image pairs data base, courtesy of Dr Pierre-Yves Bondiau (cancer treatment center "Antoine Lacassagne", Nice, France), was used for the computations. In total, the workflow execution resulted in 756 job submissions. The different levels of optimization implemented in MOTEUR permitted a speed-up higher than 9.1 when compared to a naive execution of the workflow. Such data intensive applications are common in the medical image analysis community and there is an increasing need for compute infrastructure capable of efficiently processing large image databases. MOTEUR is a generic workflow engine that was designed to efficiently process data intensive workflows. It is freely available for download under a GPL-like license.
        Speaker: Tristan Glatard (CNRS)
        Slides
      • 16:00
        Coffee break 30m
      • 16:30
        K-Wf Grid: Knowledge-based Workflows in Grid 30m
        We present an IST project of the 6th Framework Programme, aimed towards intelligent grid middleware and workflow construction. The project's acronym K-Wf Grid stands for “Knowledge-based Workflow System for Grid Applications”. The project itself employs ontologies, artificial reasoning, Petri nets and modern service-oriented architectures in order to simplify the use of grid infrastructures, as well as integration of applications into the grid. K-Wf Grid system is composed of a set of modules. The most visible one is the collaboration portal, from which a user can control the infrastructure and manage his/her application workflows. Behind this portal are hidden services doing the workflow management, monitoring of applications and infrastructure, knowledge extraction, management, and reuse. The project is behind its prototype phase and a successful review by the Commission. The idea of the project is based in the observation, that users often have to learn not only how to use the grid, but also how to best take advantage of its components, how to avoid problems caused by faulty middleware, application modules and the inherent dynamic behavior of the grid infrastructure as a whole. Additionally, with the coming era of resources virtualized as web and grid services, dynamic virtual organizations and widespread resource sharing, the variables that are to be taken into account are increasing in number. Therefore we tried to devise a user layer above the infrastructure, that would be able to handle as much of the learning and remembering as possible. This layer should be able to observe what happens during application execution, infer new knowledge from these observations and use this knowledge the next time an application is executed. This way the system would - over time - optimize its behavior and use of available resources. The realization of this idea has been split into several tasks and formed into the architecture, that became the K-Wf Grid project. The main interaction of users with the system occurs through the Web Portal. Through it, users can access the grid, its data and services, obtain information stored in the knowledge management system, add new facts to it, construct and execute workflows. The portal consists of three main parts, the Grid Workflow User Interface (GWUI), the User Assistant Agent (UAA) interface, and the portal framework based on GridSphere, including collaboration tools from the Sakai project and interfaces to other K-Wf Grid modules. GWUI is a Java applet visualization of a Petri net-modeled workflow of services, in which the user can construct a workflow, execute it and monitor it. UAA is an advisor, which communicates to the user all important facts about his/her current context – the services he/she considers to use, the data he/she has or needs. Apart from automatically generated data, the displayed information contains also hints entered by other users, which may help anyone to select better data or services or avoid problems of certain workflow configurations. This way the users may collaborate together and share knowledge. Under the Web Portal lies the Workflow Orchestration and Execution module, composed of several components. These components together are able to read a definition of an abstract workflow, expand this definition into a regular workflow of calls to service interfaces, map these calls to real service instances and execute this workflow to obtain the expected results, described in the original abstract workflow. This way the user does not need to know all the services that are present in the grid and he/she is required only to state what result is required. To be able to abstract the grid in such a way as described in previous paragraph, the system has to know the semantics of the grid environment it operates on, and so we need to employ serious knowledge management, computer-based learning and reasoning. This is the area of the Knowledge module, which is split into the storage part – Grid Organization Memory (GOM), and the learning part – Knowledge Assimilation Agent (KAA). KAA takes observed events from the monitoring system, maps them to the context of the performed operation and extract new facts from them. These facts are then stored into GOM, as well as used in later workflow composition tasks in order to predict service performance. GOM itself stores all information about the available application services in a layered ontology and new applications may be easily added into its structure by describing their respective domains in an ontology, connected to the general ontology layer developed in K-Wf Grid. The monitoring infrastructure is integrated into the original grid middleware, with the Grid Performance Monitoring and Instrumentation Service (GPMIS) as a processing core. GPMIS receives information from a network of sensors, embedded into the middleware, application services (where it is possible to instrument the services) and into the other K-Wf Grid modules. Apart from collecting observations for the learning modules, the monitoring infrastructure is also a comprehensive tool for performance monitoring and tuning, with comfortable visual tools in the user portal. At the bottom of the architecture lies the grid itself – the application services, data storage nodes and communication lines. K-Wf Grid has three distinct and varied pilot applications, which it uses to test the developed modules. One of them is a flood prediction suite, developed from a previous effort in the CROSSGRID project. It consists of a set of several simulation models for meteorology, hydrology and hydraulics, as well as support and visualization tools, all instantiated as WSRF services. The second application is from the business area – a web service-based ERP system. The third application is a system for coordinated traffic management in the city of Genoa.
        Speaker: Ladislav Hluchy (Institute of Informatics, Slovakia)
        Slides
      • 17:00
        G-PBox: A framework for grid policy management 30m
        Sharing computing and storage resources among multiple Virtual Organizations which group people from different institutions often spanning many countries, requires a comprehensive policy management framework. This paper introduces G-PBox, a tool for the management of policies which integrates with other VO-based tools like VOMS, an attribute authority and DGAS an accounting system, to provide a framework for writing, administering and utilizing policies in a Grid environment.
        Speaker: Mr Andrea Caltroni (INFN)
        Slides
      • 17:30
        Title: "IBM strategic directions in workload virtualization" 30m
        "Workload virtualization is made of several disciplines: job/workflow scheduling, workload management, and provisioning. Much work has been spent so far on these various components in isolation. A better synergistic integration of these components allowing their interoperability towards an optimized resource allocation in order to satisfy user specified service level objectives is necessary. Other challenges in the grid space deal with being able to allow meta-scheduling and adaptive/dynamic workflow scheduling. In this talk, we present IBM strategic directions in the workload virtualization area. We also briefly introduce our current product portfolio in that space and describe how it may evolve over time, based on customer requirements and additional business value their satisfaction could provide them."
        Speaker: Dr Jean-Pierre Prost (IBM Montpellier)
        Slides
    • 13:00 14:00
      Lunch 1h