EGEE User Forum

Europe/Zurich
CERN

CERN

Description

The EGEE (Enabling Grids for E-sciencE) project provides the largest production grid infrastructure for applications. In the first two years of the project an increasing number of diverse users communities have been attracted by the possibilities offered by EGEE and have joined the initial user communities. The EGEE user community feels it is now appropriate to meet to share their experiences, and to set new targets for the future, including both the evolution of the existing applications and the development and deployment of new applications onto the EGEE infrastructure.

The EGEE Users Forum will provide an important opportunity for innovative applications to establish contacts with EGEE and with other user communities, to plan for the future usage of the EGEE grid infrastructure, to learn about the latest advances, and to discuss the future evolution in the grid middleware. The main goal is to create a dynamic user community, starting from the base of existing users, which can increase the effectiveness of the current EGEE applications and promote the fast and efficient uptake of grid technology by new disciplines. EGEE fosters pioneering usage of its infrastructure by encouraging collaboration between diverse scientific disciplines. It does this to evolve and to expand the services offered to the EGEE user community, maximising the scientific, technological and economical relevance of grid-based activities.

We would like to invite hands-on users of the EGEE Grid Infrastructure to Submit an Abstract for this event following the suggested template.

EGEE User Forum Web Page
Participants
  • Adrian Vataman
  • Alastair Duncan
  • Alberto Falzone
  • Alberto Ribon
  • Ales Krenek
  • Alessandro Comunian
  • Alexandru Tudose
  • Alexey Poyda
  • Algimantas Juozapavicius
  • Alistair Mills
  • Alvaro del Castillo San Felix
  • Andrea Barisani
  • Andrea Caltroni
  • Andrea Ferraro
  • Andrea Manzi
  • Andrea Rodolico
  • Andrea Sciabà
  • Andreas Gisel
  • Andreas-Joachim Peters
  • Andrew Maier
  • Andrey Kiryanov
  • Aneta Karaivanova
  • Antonio Almeida
  • Antonio De la Fuente
  • Antonio Laganà
  • Antony wilson
  • Arnaud PIERSON
  • Arnold Meijster
  • Benjamin Gaidioz
  • Beppe Ugolotti
  • Birger Koblitz
  • Bjorn Engsig
  • Bob Jones
  • Boon Low
  • Catalin Cirstoiu
  • Cecile Germain-Renaud
  • Charles Loomis
  • CHOLLET Frédérique
  • Christian Saguez
  • Christoph Langguth
  • Christophe Blanchet
  • Christophe Pera
  • Claudio Arlandini
  • Claudio Grandi
  • Claudio Vella
  • Claudio Vuerli
  • Claus Jacobs
  • Craig Munro
  • Cristian Dittamo
  • Cyril L'Orphelin
  • Daniel JOUVENOT
  • Daniel Lagrava
  • Daniel Rodrigues
  • David Colling
  • David Fergusson
  • David Horn
  • David Smith
  • David Weissenbach
  • Davide Bernardini
  • Dezso Horvath
  • Dieter Kranzlmüller
  • Dietrich Liko
  • Dmitry Mishin
  • Doina Banciu
  • Domenico Vicinanza
  • Dominique Hausser
  • Eike Jessen
  • Elena Slabospitskaya
  • Elena Tikhonenko
  • Elisabetta Ronchieri
  • Emanouil Atanassov
  • Eric Yen
  • Erwin Laure
  • Esther Acción García
  • Ezio Corso
  • Fabrice Bellet
  • Fabrizio Pacini
  • Federica Fanzago
  • Fernando Felix-Redondo
  • Flavia Donno
  • Florian Urmetzer
  • Florida Estrella
  • Fokke Dijkstra
  • Fotis Georgatos
  • Fotis Karayannis
  • Francesco Giacomini
  • Francisco Casatejón
  • Frank Harris
  • Frederic Hemmer
  • Gael youinou
  • Gaetano Maron
  • Gavin McCance
  • Gergely Sipos
  • Giorgio Maggi
  • Giorgio Pauletto
  • giovanna stancanelli
  • Giuliano Pelfer
  • Giuliano Taffoni
  • Giuseppe Andronico
  • Giuseppe Codispoti
  • Hannah Cumming
  • Hannelore Hammerle
  • Hans Gankema
  • Harald Kornmayer
  • Horst Schwichtenberg
  • Huard Helene
  • Hugues BENOIT-CATTIN
  • Hurng-Chun LEE
  • Ian Bird
  • Ignacio Blanquer
  • Ilyin Slava
  • Iosif Legrand
  • Isabel Campos Plasencia
  • Isabelle Magnin
  • Jacq Florence
  • Jakub Moscicki
  • Jan Kmunicek
  • Jan Svec
  • Jaouher KERROU
  • Jean Salzemann
  • Jean-Pierre Prost
  • Jeremy Coles
  • Jiri Kosina
  • Joachim Biercamp
  • Johan Montagnat
  • John Walk
  • John White
  • Jose Antonio Coarasa Perez
  • José Luis Vazquez
  • Juha Herrala
  • Julia Andreeva
  • Kerstin Ronneberger
  • Kiril Boyanov
  • Kiril Boyanov
  • Konstantin Skaburskas
  • Ladislav Hluchy
  • Laura Cristiana Voicu
  • Laura Perini
  • Leonardo Arteconi
  • Livia Torterolo
  • Losilla Guillermo Anadon
  • Luciano Milanesi
  • Ludek Matyska
  • Lukasz Skital
  • Luke Dickens
  • Malcolm Atkinson
  • Marc Rodriguez Espadamala
  • Marc-Elian Bégin
  • Marcel Kunze
  • Marcin Plociennik
  • Marco Cecchi
  • Mariusz Sterzel
  • Marko Krznaric
  • Markus Schulz
  • Martin Antony Walker
  • Massimo Lamanna
  • Massimo Marino
  • Miguel Cárdenas Montes
  • Mike Mineter
  • Mikhail Zhizhin
  • Mircea Nicolae Tugulea
  • Monique Petitdidier
  • Muriel Gougerot
  • Nadezda Fialko
  • Nadine Neyroud
  • Nick Brook
  • Nicolas Jacq
  • Nicolas Ray
  • Nils Buss
  • Nuno Santos
  • Osvaldo Gervasi
  • Othmane Bouhali
  • Owen Appleton
  • Pablo Saiz
  • Panagiotis Louridas
  • Pasquale Pagano
  • Patricia Mendez Lorenzo
  • Pawel Wolniewicz
  • Pedro Andrade
  • Peter Kacsuk
  • Peter Praxmarer
  • Philippa Strange
  • Philippe Renard
  • Pier Giovanni Pelfer
  • Pietro Lio
  • Pietro Liò
  • Rafael Leiva
  • Remi Mollon
  • Ricardo Brito da Rocha
  • Riccardo di Meo
  • Robert Cohen
  • Roberta Faggian Marque
  • Roberto Barbera
  • Roberto Santinelli
  • Rolandas Naujikas
  • Rolf Kubli
  • Rolf Rumler
  • Romier Genevieve
  • Rosanna Catania
  • Sabine ELLES
  • Sandor Suhai
  • Sergio Andreozzi
  • Sergio Fantinel
  • Shkelzen RUGOVAC
  • Silvano Paoli
  • Simon Lin
  • Simone Campana
  • Soha Maad
  • Stefano Beco
  • Stefano Cozzini
  • Stella Shen
  • Stephan Kindermann
  • Steve Fisher
  • tao-sheng CHEN
  • Texier Romain
  • Toan Nguyen
  • Todor Gurov
  • Tomasz Szepieniec
  • Tony Calanducci
  • Torsten Antoni
  • tristan glatard
  • Valentin Vidic
  • Valerio Venturi
  • Vangelis Floros
  • Vaso Kotroni
  • Venicio Duic
  • Vicente Hernandez
  • Victor Lakhno
  • Viet Tran
  • Vincent Breton
  • Vincent LEFORT
  • Vladimir Voznesensky
  • Wei-Long Ueng
  • Ying-Ta Wu
  • Yury Ryabov
  • Ákos Frohner
    • 13:00 14:00
      Lunch 1h
    • 12:30 14:00
      Lunch 1h 30m
    • 14:00 18:30
      2c: Special type of jobs (MPI, SDJ, interactive jobs, ...) - Information systems 40/4-C01

      40/4-C01

      CERN

      30
      Show room on map
      • 14:00
        Scheduling Interactive Jobs 30m
        1.Introduction In the 70s, the transition from batch systems to interactive computing has been the enabling tool for the widespread diffusion of advances in IC technology. Grids are facing the same challenge. The exponential coefficients in network performance enable the virtualization and pooling of processors and storage; large- scale user involvement might require seamless integration of the grid power into everyday use. In this paper,interaction is a short name for all situations of display-action loop, ranging from a code-test-debug process in plain ascii, to computational steering through virtual/augmented reality interfaces, as well as portal access to grid resources, or complex and partially local workflows. At various levels, EGEE HEP and biomedical communities provide examples of the requirements of a turnaround time at the human scale. Section 2 will provide experimental evidence on this fact. Virtual machines provide a powerful new layer of abstraction in distributed computing environments. The freedom of scheduling and even migrating an entire OS and associated computations considerably eases the coexistence of deadline bound short jobs and long running batch jobs. The EGEE execution model is not based on such virtual machines, thus the scheduling issues must be addressed through the standard middleware components, broker and local schedulers. Section 3 and 4 will demonstrate that QoS and fast turnaround time are indeed feasible within these constraints. 2. EGEE usage The current use of EGEE makes a strong case for a specific support for short jobs. Through the analysis of the LB log of a broker, we can propose quantitative data to support this affirmation. The broker logged is grid09.lal.in2p3.fr, running successive versions of LCG; the trace covers one year (October 2004 to October 2005), with 66 distinct users and more than 90000 successful jobs, all production. This trace provides both the job intrinsic execution time $t$ (evaluated as the timestamp of event 10/LRMS minus the timestamp of event 8/LRMS), and the makespan $m$, that is the time from submission to completion (evaluated as the timestamp of event 10/LogMonitor minus the timestamp of event 17/UI). The intrinsic execution time might be overestimated if the sites where the job is run accept concurrent execution. The striking fact is the very large number of extremely short jobs. We call Short Deadline Jobs (SDJ) those where t < 10 minutes, and Medium Jobs (MJ) those with t between ten minutes and one hour. SDJ account for more than 90% of the total number of jobs, and consume nearly 20 of the total execution time, in the same range as jobs with $t$ less than one hour (17%). Next, we considering the overhead o =(m-t)/t. As usual, the overhead decreases with execution time, but for SDJ, the overhead is often many orders of magnitude superior to $t$. For MJ, the overhead is of the same order of magnitude as $t$. Thus, the EGEE service for SDJ is seriously insufficient. One could argue that bundling many SDJ into one MJ could lower the overhead. However, interactivity will not be reached, because results will also come in a bundle: for graphical interactivity, the result must obviously be pipelined with visualization; in the test-debug-correct cycle, there might be not very many jobs to run. With respect to grid management, an interactivity situation translates into a QoS requirement: just as video rendering or music playing requires special scheduling on a personal computer, or video streaming requires network differentiated services, servicing SDJ requires a specific grid guarantee, namely a small bound on the makespan, which is usually known as a deadline in the framework of QoS. The overhead has two components: first the queuing time, and second the cost of traversal of the middleware protocol stack. The first issue is related to the grid scheduling policy, while the second is related to grid scheduling implementation. 3. A Scheduling Policy for SDJ Deadline scheduling usually relies on the concept of breaking the allocation of resources into quanta, of time for a processor, or through packet slots for network routing. For job scheduling, the problem is a priori much more difficult, because jobs are not partitionable: except for checkpointable jobs, a job that has started running cannot be suspended and restarted later. Condor has pioneered migration-based environments, which provide such a feature transparently, but deploying constrained suspension in EGEE would be much too invasive, with respect to existing middleware. Thus, SDJ should not be queued at all, which seems to be incompatible with the most basic mechanism of grid scheduling policies. The EGEE scheduling policy is largely decentralized: all queues are located on the sites, and the actual time scheduling is enacted by the local schedulers. Most often, these schedulers do not allow time-sharing (except for monitoring). The key for servicing SDJ is to allow controlled time-sharing, which transparently leverages the kernel multiplexing to jobs, through a combination of processor virtualization and slot permanent reservation. The SDJ scheduling system has two components. - A local component, composed of dedicated single-entry queues and a configuration of the local scheduler. Technical details for can be found at http://egee- na4.ct.infn.it/wiki/index.php/ShortJobs. It ensures the followig properties: the delay incurred by batch jobs is at most doubled; the resource usage is not degraded, eg by idling processors; and finally the policies governing resource sharing (VOs, EGEE and non EGEE users,...) are not impacted. - A global component, composed of job typing and mapping policy at the broker level. While it is easy to ensure that SDJ are directed to resources accepting SDJ, LCG and gLite do not provide the means to prevent non-SDJ jobs from using the SDJ queues, and this requires a minor modification of the broker code. It must be noticed that no explicit user reservation is required: seamless integration also means that explicit advance reservation is no more applicable than it would be for accessing a personal computer or a video-on-demand service. In the most frequent case, SDJ will run with under the best effort Linux scheduling policy (SCHED_OTHER); however, if hard real-time constraints must be met, this scheme is fully compatible with preemption (SCHED_FIFO or SCHED_RR policies). In any case, the limits on resource usage(e.g. as enforced by Maui) implement access control; thus the job might be rejected. The WMS notifies rejection to the application, which could decide on the most adequate reaction, for instance submission as a normal job or switching to local computation. 4. User-level scheduling Recent reports (gLite WMS Test) show impressively low middleware penalty, in the order of a few seconds, which should be available in gLite3.0. It also hints that the broker is not too heavily impacted by many simultaneous access. However, for ultra-small jobs, with execution time of the same order (XXSDJ), even this penalty is too high. Moreover, the notification time remains in the order of minutes. In the gPTM3D project, we have shown that an additional layer of user-level scheduling provides a solution which is fully compatible with EGEE organization of sharing. The scheduling and execution agents are quite different from those in Dirac: they do not constitute a permanent overlay, but are launched just as any LCG/gLite job, namely an SDJ job; moreover, they work in connected mode, more like glogin-based applications. Besides this particular case, an open issue is the internal SDJ scheduling. Consider for instance a portal, where many users ask for a continuous stream of execution of SDJ (whether XXSDJ or regular SDJ). The portal could dynamically launch such scheduling/worker agents and delegate to them the implementation of the so-called (period, slice) model used in soft real-time scheduling.
        Speaker: Cecile Germain-Renaud (LRI and LAL)
        Slides
      • 14:30
        Real time computing for financial applications 30m
        Computing grids are quite attractive for large scale financial applications: this is especially evident in the segment of dynamic financial services, where applications must complete complex tasks within strict deadlines. The traditional response has been to over-provision for making sure there is plenty of ’headroom’ in resource availability, thereby maintaining large computational resources booked and unused with a great cost in terms of infrastructure. Moreover nowadays some of these complex tasks need an amount of computing power that is unfeasible to keep in house. Computing grids can deliver the amounts of power needed in such a scenario, but there are still large limitations to overcome. In this brief report we address the solution we developed to provide real time computing power through the EGRID facility for a test case financial application. The test case we consider is an application that estimates the sensitivities of a set of stocks to specific risk factors: technical details about the procedure can be found elsewhere; we will present here only the computational details of the application to better define the problem we faced, and the solutions adopted for porting it to the grid. We implemented different technical solutions for our application in a sort of trial and error fashion. We will present briefly all of the attempts. All implemented solutions rely on a “job reservation mechanism”: we allocate grid resources in advance to eliminate latency due to the job submission mechanism. In this way, as soon as we get enough resources allocated we can interact with them in real time. The drawback is that being an advanced booking strategy, for “best effort” services this approach could be unfeasible. It is not the case for this experimental work though, but the limitation should be taken into account when approaching production runs. The booking mechanism has been implemented in the following way. An early submission of a bunch of jobs is run for securing the availability of WN at a given time. Each pooled node will executes a program that regularly checks a host (usually the UI, but not necessarily). The contacted host enrolls this WN for the user’s program, as soon as the user executes that program. When the execution terminates the results are available in real time without any delay introduced by WMS of the grid. The WNs remain booked, and so are ready to be enrolled again for other program executions; eventually they are freed by the user. This approach, where the WN asks to be enrolled in a computation thereby acting as a client, is needed because the WN cannot be reached directly from the UI.
        Speaker: Dr Stefano Cozzini (CNR-INFM Democritos and ICTP)
        Slides
      • 15:00
        Grid-Enabled Remote Instrumentation with Distributed Control and Computation 30m
        1 GRIDCC Applications and Requirements The GRIDCC project [1], sponsored by the European Union under contract number 511381, and launched in September 2004, endeavors to integrate scientific and general-purpose instruments within the Grid. The motivation is to exploit the Grid opportunities for secure, collaborative work of distributed teams and to utilize the Grid’s massive memory and computing resources for the storage and processing of data generated by scientific equipment. The GRIDCC project focuses its attention on eight applications, four of which will be fully integrated, tested and deployed on the Grid. The PowerGrid will support the remote monitoring and control of thousands of small power generators; while the Control and Monitoring of HEP experimentsaims to enable remote control and monitoring of the CMS detector at CERN. The (Far) Remote Operation of Accelerator Facility is an application for the full operation of a remote accelerator in Trieste, Italy; and the Grid-based Intrusion Detection System aims to provide detection and trace-back of flow-based DoS attacks using aggregated data collected from multiple routers. The other set of relevant applications includes: meteorology, neurophysiology, handling of device farms for measurements in telecommunications laboratories, and geophysiology [2][5]. The project, by nature, requires the availability of software components that allow for time-bounded and secure interactions, while operating instrumentation in a collaborative environment. In addition to the classical request/response Grid service interaction model, a considerable amount of information needs to be streamed from the instrument back to the user. The time-bounded interactions, dictated either by the instrument sensitivity and the accompanying requirement for careful handling and fast response to extreme conditions, or by the applications themselves, lead to the need for the establishment of SLAs for QoS or other guarantees, with support for compensation and rollback. The idea of collaboration and resource sharing, inherent in the Grid, is also extended and adapted to allow the share of unique instruments among users who are geographically dispersed, and who normally would not have access to such – usually rare and/or expensive – equipment. 2 GRIDCC and gLite To cater for the diversity of instruments and the critical nature of the equipment being handled, the GRIDCC middleware platform relies on Web Service (WS) technologies, and sustains a Service Level Agreement (SLA) infrastructure, alongside enforcement of Quality of Service (QoS) guarantees. The GRIDCC middleware architecture is fully described in [2]. A number of gLite software components are extremely relevant to the GRIDCC middleware architecture, which is designed to comprise various novel middleware components to complement them. Firstly, we plan to perform job scheduling and bookkeping via the WMS and specifically the WMProxy, and the LBProxy [2]. We also plan to rely on the Agreement Service for SLA signalling and for triggering resource-level reservations [2]– this is essential to enforce SLA guarantees. In addition, we plan to test and possibly extend CREAM, as explained in the following Section. The WSDL interface, exposed by the gLite WMS, streamlines job submission in a number of different scenarios: direct invocation by the Virtual Control Room (VCR) - the GRIDCC portal; direct submission onto preselected CEs via the GRIDCC Workflow Management System (WfMS); and indirectly, utilising the WMS’s builtin scheduling capabilities, either as a single submission or part of a workflow [2]. The WfMS and VCR are described in more detail in Section 3. Data gathered from IEs need to be stored, in MSS sevices. Consequently, data storage will be delegated to gLite SEs exposing SRM-compliant interfaces. VOMS and proxy-renewal services will be used. For authentication and authorization, it is foreseen to support both X.509 certificates and the Kerberos framework. The latter will be used when low response times are required. Finally, for QoS performance monitoring, as it is experienced by GRIDCC users and services, we require the integration of service monitoring tools and services providing information about network performance, such as the gLite Network performance Monitoring framework. 3 GRIDCC Middleware The gap between GRIDCC’s requirements and gLite’s existing service support, will be filled by a number of GRIDCC solutions, which leverage the existing gLite functionality. The need for instrument support, necessitated the development of a new grid component, the Instrument Element (IE). The IE’s naming and design reflect its similarity to gLite’s SE and CE. The IE provides a Grid interface to a physical instrument or set of instruments, and should allow the user to control and access instrument data [2]. To cater for the varied needs of instrumentation, the IE also has local automated management and storage capacity [2]. The desire for QoS and SLA support is provided for by the following Execution Service components. The gLite AS will be extended to establish SLAs with the IE, and the IE will need to enforce such SLAs. To achieve this, the IE conceptual model and schema need to be defined in order to publish information about the instrument- specific properties. The GRIDCC Workflow Management System (WfMS) provides an interface for users to submit workflows, which can orchestrate WS calls to underlying services [3]. The WfMS may also need to choreograph further steps into workflows, such as the SLA negotiation and logging steps, to facilitate the satisfaction of, possibly complex, QoS demands from the user [3]. It is also responsible for monitoring running workflows and responding to workflow events - such as contacting a user if QoS demands can no longer be satisfied [2]. The Virtual Control Room (VCR), supports a user Grid portal for the underlying services, in particular to: request SLAs from the AS; steer and monitor an IE; and submit workflows to the WfMS [2][3][4]. Additionally, the VCR provides a multi-user collaborative online environment, wherein remote users and support staff, share control of and troubleshoot IEs [2][4]. 4 Extending gLite To fulfill the GRIDCC application requirements, a number of gLite functionality extensions would be useful for successful middleware integration. Firstly, information about IEs needs to be made available by the information services. Secondly, in order to enforce upper-bounded execution times, the reservation of CEs and IEs needs to be supported. To this end, we will extend the AS, by adding CE and IE-specific SLA templates. Reservation needs to be triggered and enforced by elements at the fabric-layer. For this reason, we envisage the addition of a new operation to the WSDL interface exposed by CREAM, allowing the invocation of reservation operations. As mentioned above, GRIDCC, needs for QoS to be enforced at both the single-task and workflow level. The WMS already supports some workflow functionality; however, the WMS can only process workflows involving job execution tasks. We foresee the need to merge the functionality of the GRIDCC WfMS with the gLite WMS, to benefit from the existing WMS capabilities and avoid duplication of work. References [1] The GRIDCC Project home page: http://www.gridcc.org. [2] The GRIDCC Architecture – Architecture of Services for a Grid Enabled Remote Instrument Infrastructure (http://www.gridcc.org/getfile.php?id=1382). [3] D4.1 Basic Release R1, GRIDCC Project Deliverable GRIDCC-D4.1, May 2005 (https://ulisse.elettra.trieste.it/tutos_gridcc/php/file/file_show.php?id=1418) [4] Multipurpose Collaborative Environment, GRIDCC Project Deliverable GRIDCC- D5_2, Sept 2005 (https://ulisse.elettra.trieste.it/tutos_gridcc/php/file/file_show.php?id=1408) [5] SPECIFIC TARGETED RESEARCH OR INNOVATION PROJECT – Annex I - “Description of Work”, May 2004 (http://www.gridcc.org) [6] EGEE Middleware Architecture and planning, EGEE Project, Deliverable EGEE- DJRA1.1-594698-v1.0, Jul 2005 (https://edms.cern.ch/document/594698/).
        Speaker: Luke Dickens (Imperial College)
        Slides
      • 15:30
        Efficient job handling in the GRID: short deadline, interactivity, fault tolerance and parallelism 30m
        The major GRID infastructures are designed mainly for batch-oriented computing with coarse-grained jobs and relatively high job turnaround time. However many practical applications in natural and physical sciences may be easily parallelized and run as a set of smaller tasks which require little or no synchronization and which may be scheduled in a more efficient way. The Distributed Analysis Environment Framework (DIANE), is a Master-Worker execution skeleton for applications, which complements the GRID middleware stack. Automatic failure recovery and task dispatching policies enable an easy customization of the behaviour of the framework in a dynamic and non-reliable computing environment. We demonstrate the experience of using the framework with several diverse real-life applications, including Monte Carlo Simulation, Physics Data Analysis and Biotechnology. The interfacing of existing sequential applications from the point of view of non-expert user is made easy, also for legacy applications. We analyze the runtime efficiency and load balancing of the parallel tasks in various configurations and diverse computing environments: GRIDs (LCG, Crossgrid), batch farms and dedicated clusters. In practice, the usage of ther Master/Worker layer allows to dramatically reduce the job turnaround time, a scenario suitable for short deadline jobs and interactive data analysis. Finally it is also possible to easily introduce more complex synchronization patterns, beyond trivial parallelism, such as arbitrary dependency graphs (including cycles, in contrast to DAGs) which may be suitable for bio-informatics applications.
        Speaker: Mr Jakub MOSCICKI (CERN)
        Slides
      • 16:00
        Coffee break 30m
      • 16:30
        Grid Computing and Online Games 30m
        With the fast growth of the video games and entertainment industry - thanks to the appearance of new games, new technologies and innovative hardware devices - the capacity to react becomes critical for competing in the market of services and entertainment. Therefore it is necessary to be able to count on advanced middleware solutions and technological platforms that allow a fast unfolding of custom made services. Andago has developed the online games platform Andago Games that provides the technological base necessary for the creation of online Games services around which the main entertainment sites will be able to establish solid business models. The platform Andago Games allows to quickly create online multiplayer games channels with the following services for the final user: * Pay per Play/ pay per subscription * Reserving of gaming rooms or servers and advance management of games * Advanced statistics * Automatic game launch * Clans * Championships, downloads, chat, etc. However, the platform requires important investments by operators and portals, limiting the number of possible customers. Grid computing will reduce dramatically the amount of these investments by means of sharing resources among different operators and portals. Also, Grid computing offers the possibility to create virtual organizations, where operators and portals could share games and contents, and even their user’s base. Technically, the goal is to be able to share expensive resources between providers and to allow billing based on usage. From a business perspective our goal is to open new commercial opportunities in the domain of entertainment. A common problem with online games is that operators, portals and games providers would like to share resources and aim at sharing the costs to optimize their businesses. Yet business entities are generally required to play all business roles. The European market is still too fragmented and it is hard to reach the critical mass of users needed to make online games businesses profitable and to ensure resource liquidity. Having a Grid infrastructure makes it possible to divide tasks among different actors and in consequence each actor could concentrate on the business it knows best. Application developers provide the applications, portal providers create the portals to attract users, and Telcos/ISP will provide the infrastructure required. Such Virtual Organisations allow for profitable alliances and resource integration. The outcome of a grid enabled online games platform will be to provide the middleware to make this collaboration happen. The Grid ensures not only decreasing costs for businesses, but allows for creating a global European market as applications, infrastructure and users can be shared independently of political and social borders, smoothly integrated and better exploited. There are also big advantages for users. For example, they will have a larger offer, better quality of service and certainly cheaper services. Grid centralized portals would provide thousands of games and entertainment content from different providers. Today, if one buys a new game and wants to play it online, the user has to connect to a server (possibly) in the USA, unless a local server was set up. Having a Grid infrastructure would largely ease that process. Users will simply connect to the Grid, play and join the international community of users. An online games scenario implies strong requirements on QoS for the provision in real-time of distributed multimedia content all over the world. Also usage monitoring is quite important due to the user profiling and its matching with the content (underage access to inadequate contents). Privacy, billing and community building are other properties relevant for online games and entertainment.
        Speaker: Mr Rafael Garcia Leiva (Adago Ingenieria)
        Slides
      • 17:00
        User Applications of R-GMA 30m
        The Relational Grid Monitoring Architecture (R-GMA) provides a uniform method to access and publish both information and monitoring data. It has been designed to be easy for individuals to publish and retrieve data. It provides information about the grid, mainly for the middleware packages, and information about grid applications for users. From a user's perspective, an R-GMA installation appears as a single virtual database. R-GMA provides a flexible infrastructure in which producers of information can be dynamically created and deleted and tables can be dynamically added and removed from a schema. All of the data that is published has a timestamp, enabling its use for monitoring. R-GMA is currently being used for job monitoring, application monitoring, network monitoring, grid FTP monitoring and the site functional tests (SFT). R-GMA is a relational implementation of the Global Grid Forum's (GGF) Grid Monitoring Architecture (GMA). GMA defines producers and consumers of information and a registry that knows the location of all consumers and producers. R-GMA provides Consumer, Producer, Registry and Schema services. The consumer service allows the user to issue a number of different types of query: history, latest and continuous. History queries are queries over time sequenced data and latest queries correspond to the intuitive idea of current information. For a continuous query, new data are broadcast to all subscribed consumers as soon as those data are published via a producer. Consumers are automatically matched with producers of the appropriate type that will satisfy their query. Data published by application code is stored by a producer service. R-GMA provides a producer service that includes primary and secondary producers. Primary producers are the initial source of data within an R-GMA system. Secondary producers can be used to republish data in order to co-locate information to speed up queries (and allow multi-table queries), to reduce network traffic and to offer different producer properties. It is envisaged that there will be numerous primary producers and one or two secondary producers for each subset of data. Both primary and secondary producers may use memory or a database to store the data and may specify retention periods. Memory producers give the best performance for continuous queries, whereas database producers give the best performance where joins are required. It is not necessary for users to know where other producers and consumers are: this is managed by the local producer and consumer services on behalf of the user. In most cases it is not even necessary to know the location of the local producer and consumer services, as worker nodes and user interface nodes are already configured to point to their local R-GMA producer and consumer services. There are already a number of applications using R-GMA. The first example is job monitoring. There was a requirement to allow grid users to monitor the progress of their jobs and for VO administrators to get an overview of what was happening on the grid. The problems were that the location in which a grid job would end up was not known in advance, and that worker nodes were behind firewalls so they were not accessible remotely. SA1 has adopted the job wrapper approach, as this did not require any changes to the application code. Every job is put in a wrapper that periodically publishes information about the state of the process running the job and its environment. These data are currently being published via the SA1 JobMonitoring table within R-GMA. A second application has been written to run on the resource broker nodes. This application examines the logging and bookkeeping logs and publishes data about the changes in state of grid jobs. These data are made available via the SA1 JobStatusRaw table. Both the producer in the job wrapper and the producers on the resource broker nodes make use of R-GMA memory primary producers. A database secondary producer is used to aggregate the data. Other uses of R-GMA include application monitoring, network monitoring and gridFTP monitoring. There are a number of different ways to implement application monitoring including the wrapper approach, as the job monitoring, and instrumentation of the application code. Instrumentation of the code can mean using a logging service, e.g. log4j, which publishes data via R-GMA, or calling R-GMA API methods directly from the application code. The network monitoring group, NA4, have been using R-GMA to publish a number of network metrics. They used memory primary producers in the network sensors to publish the data and a database secondary producer to aggregate the data. SA1 have made use of the consumer service for monitoring grid FTP metrics. They have written a memory primary producer that sits on the gridFTP server nodes and publishes statistics about the file transfers. A continuous consumer is used to pull in all the data to a central location, from where it is written to an Oracle database for analysis. This was used for Service Challenge 3. Two patterns have emerged from the use made of R-GMA for monitoring. In both patterns data is initially published using memory primary producers. These may be short lived and only make the data available for a limited time, e.g. the lifetime of a grid job. In one pattern data are made persistent by using a consumer to populate an external database which applications query directly. In the other pattern, an R-GMA secondary producer is used to make the data persistent and also make it available for querying through R-GMA. In the coming months we plan to add support for multiple Virtual Data Bases, authorization within the context of a Virtual Data Base using VOMS attributes, registry replication, load balancing over multiple R-GMA servers and support for Oracle. R-GMA is an information and monitoring system that has been specifically designed for the grid environment. It can be used by systems, VOs and individuals and is already in use in production.
        Speaker: Dr Steve Fisher (RAL)
        Slides
      • 17:30
        Final discussion on the session topics 1h
    • 13:00 14:00
      Lunch 1h