EGEE User Forum

Name: EGEE User Forum
Start: 2006-03-01T08:00:00+01:00
End: 2006-03-03T20:00:00+01:00
Location: CERN

1 Mar 2006, 08:00 → 3 Mar 2006, 20:00 Europe/Zurich

CERN

Description

The EGEE (Enabling Grids for E-sciencE) project provides the largest production grid infrastructure for applications. In the first two years of the project an increasing number of diverse users communities have been attracted by the possibilities offered by EGEE and have joined the initial user communities. The EGEE user community feels it is now appropriate to meet to share their experiences, and to set new targets for the future, including both the evolution of the existing applications and the development and deployment of new applications onto the EGEE infrastructure.

The EGEE Users Forum will provide an important opportunity for innovative applications to establish contacts with EGEE and with other user communities, to plan for the future usage of the EGEE grid infrastructure, to learn about the latest advances, and to discuss the future evolution in the grid middleware. The main goal is to create a dynamic user community, starting from the base of existing users, which can increase the effectiveness of the current EGEE applications and promote the fast and efficient uptake of grid technology by new disciplines. EGEE fosters pioneering usage of its infrastructure by encouraging collaboration between diverse scientific disciplines. It does this to evolve and to expand the services offered to the EGEE user community, maximising the scientific, technological and economical relevance of grid-based activities.

We would like to invite hands-on users of the EGEE Grid Infrastructure to Submit an Abstract for this event following the suggested template.

Participants

Adrian Vataman
Alastair Duncan
Alberto Falzone
Alberto Ribon
Ales Krenek
Alessandro Comunian
Alexandru Tudose
Alexey Poyda
Algimantas Juozapavicius
Alistair Mills
Alvaro del Castillo San Felix
Andrea Barisani
Andrea Caltroni
Andrea Ferraro
Andrea Manzi
Andrea Rodolico
Andrea Sciabà
Andreas Gisel
Andreas-Joachim Peters
Andrew Maier
Andrey Kiryanov
Aneta Karaivanova
Antonio Almeida
Antonio De la Fuente
Antonio Laganà
Antony wilson
Arnaud PIERSON
Arnold Meijster
Benjamin Gaidioz
Beppe Ugolotti
Birger Koblitz
Bjorn Engsig
Bob Jones
Boon Low
Catalin Cirstoiu
Cecile Germain-Renaud
Charles Loomis
CHOLLET Frédérique
Christian Saguez
Christoph Langguth
Christophe Blanchet
Christophe Pera
Claudio Arlandini
Claudio Grandi
Claudio Vella
Claudio Vuerli
Claus Jacobs
Craig Munro
Cristian Dittamo
Cyril L'Orphelin
Daniel JOUVENOT
Daniel Lagrava
Daniel Rodrigues
David Colling
David Fergusson
David Horn
David Smith
David Weissenbach
Davide Bernardini
Dezso Horvath
Dieter Kranzlmüller
Dietrich Liko
Dmitry Mishin
Doina Banciu
Domenico Vicinanza
Dominique Hausser
Eike Jessen
Elena Slabospitskaya
Elena Tikhonenko
Elisabetta Ronchieri
Emanouil Atanassov
Eric Yen
Erwin Laure
Esther Acción García
Ezio Corso
Fabrice Bellet
Fabrizio Pacini
Federica Fanzago
Fernando Felix-Redondo
Flavia Donno
Florian Urmetzer
Florida Estrella
Fokke Dijkstra
Fotis Georgatos
Fotis Karayannis
Francesco Giacomini
Francisco Casatejón
Frank Harris
Frederic Hemmer
Gael youinou
Gaetano Maron
Gavin McCance
Gergely Sipos
Giorgio Maggi
Giorgio Pauletto
giovanna stancanelli
Giuliano Pelfer
Giuliano Taffoni
Giuseppe Andronico
Giuseppe Codispoti
Hannah Cumming
Hannelore Hammerle
Hans Gankema
Harald Kornmayer
Horst Schwichtenberg
Huard Helene
Hugues BENOIT-CATTIN
Hurng-Chun LEE
Ian Bird
Ignacio Blanquer
Ilyin Slava
Iosif Legrand
Isabel Campos Plasencia
Isabelle Magnin
Jacq Florence
Jakub Moscicki
Jan Kmunicek
Jan Svec
Jaouher KERROU
Jean Salzemann
Jean-Pierre Prost
Jeremy Coles
Jiri Kosina
Joachim Biercamp
Johan Montagnat
John Walk
John White
Jose Antonio Coarasa Perez
José Luis Vazquez
Juha Herrala
Julia Andreeva
Kerstin Ronneberger
Kiril Boyanov
Kiril Boyanov
Konstantin Skaburskas
Ladislav Hluchy
Laura Cristiana Voicu
Laura Perini
Leonardo Arteconi
Livia Torterolo
Losilla Guillermo Anadon
Luciano Milanesi
Ludek Matyska
Lukasz Skital
Luke Dickens
Malcolm Atkinson
Marc Rodriguez Espadamala
Marc-Elian Bégin
Marcel Kunze
Marcin Plociennik
Marco Cecchi
Mariusz Sterzel
Marko Krznaric
Markus Schulz
Martin Antony Walker
Massimo Lamanna
Massimo Marino
Miguel Cárdenas Montes
Mike Mineter
Mikhail Zhizhin
Mircea Nicolae Tugulea
Monique Petitdidier
Muriel Gougerot
Nadezda Fialko
Nadine Neyroud
Nick Brook
Nicolas Jacq
Nicolas Ray
Nils Buss
Nuno Santos
Osvaldo Gervasi
Othmane Bouhali
Owen Appleton
Pablo Saiz
Panagiotis Louridas
Pasquale Pagano
Patricia Mendez Lorenzo
Pawel Wolniewicz
Pedro Andrade
Peter Kacsuk
Peter Praxmarer
Philippa Strange
Philippe Renard
Pier Giovanni Pelfer
Pietro Lio
Pietro Liò
Rafael Leiva
Remi Mollon
Ricardo Brito da Rocha
Riccardo di Meo
Robert Cohen
Roberta Faggian Marque
Roberto Barbera
Roberto Santinelli
Rolandas Naujikas
Rolf Kubli
Rolf Rumler
Romier Genevieve
Rosanna Catania
Sabine ELLES
Sandor Suhai
Sergio Andreozzi
Sergio Fantinel
Shkelzen RUGOVAC
Silvano Paoli
Simon Lin
Simone Campana
Soha Maad
Stefano Beco
Stefano Cozzini
Stella Shen
Stephan Kindermann
Steve Fisher
tao-sheng CHEN
Texier Romain
Toan Nguyen
Todor Gurov
Tomasz Szepieniec
Tony Calanducci
Torsten Antoni
tristan glatard
Valentin Vidic
Valerio Venturi
Vangelis Floros
Vaso Kotroni
Venicio Duic
Vicente Hernandez
Victor Lakhno
Viet Tran
Vincent Breton
Vincent LEFORT
Vladimir Voznesensky
Wei-Long Ueng
Ying-Ta Wu
Yury Ryabov
Ákos Frohner

Wednesday, 1 March
- 13:00 → 14:00
  
  Lunch 1h
Thursday, 2 March
- 12:30 → 14:00
  
  Lunch 1h 30m
- 14:00 → 18:30
  2c: Special type of jobs (MPI, SDJ, interactive jobs, ...) - Information systems 40/4-C01
  
  40/4-C01
  
  CERN
  
  30
  Show room on map
  - 14:00
    
    Scheduling Interactive Jobs 30m
    
    1.Introduction In the 70s, the transition from batch systems to interactive computing has been the enabling tool for the widespread diffusion of advances in IC technology. Grids are facing the same challenge. The exponential coefficients in network performance enable the virtualization and pooling of processors and storage; large- scale user involvement might require seamless integration of the grid power into everyday use. In this paper,interaction is a short name for all situations of display-action loop, ranging from a code-test-debug process in plain ascii, to computational steering through virtual/augmented reality interfaces, as well as portal access to grid resources, or complex and partially local workflows. At various levels, EGEE HEP and biomedical communities provide examples of the requirements of a turnaround time at the human scale. Section 2 will provide experimental evidence on this fact. Virtual machines provide a powerful new layer of abstraction in distributed computing environments. The freedom of scheduling and even migrating an entire OS and associated computations considerably eases the coexistence of deadline bound short jobs and long running batch jobs. The EGEE execution model is not based on such virtual machines, thus the scheduling issues must be addressed through the standard middleware components, broker and local schedulers. Section 3 and 4 will demonstrate that QoS and fast turnaround time are indeed feasible within these constraints. 2. EGEE usage The current use of EGEE makes a strong case for a specific support for short jobs. Through the analysis of the LB log of a broker, we can propose quantitative data to support this affirmation. The broker logged is grid09.lal.in2p3.fr, running successive versions of LCG; the trace covers one year (October 2004 to October 2005), with 66 distinct users and more than 90000 successful jobs, all production. This trace provides both the job intrinsic execution time $t$ (evaluated as the timestamp of event 10/LRMS minus the timestamp of event 8/LRMS), and the makespan $m$, that is the time from submission to completion (evaluated as the timestamp of event 10/LogMonitor minus the timestamp of event 17/UI). The intrinsic execution time might be overestimated if the sites where the job is run accept concurrent execution. The striking fact is the very large number of extremely short jobs. We call Short Deadline Jobs (SDJ) those where t < 10 minutes, and Medium Jobs (MJ) those with t between ten minutes and one hour. SDJ account for more than 90% of the total number of jobs, and consume nearly 20 of the total execution time, in the same range as jobs with $t$ less than one hour (17%). Next, we considering the overhead o =(m-t)/t. As usual, the overhead decreases with execution time, but for SDJ, the overhead is often many orders of magnitude superior to $t$. For MJ, the overhead is of the same order of magnitude as $t$. Thus, the EGEE service for SDJ is seriously insufficient. One could argue that bundling many SDJ into one MJ could lower the overhead. However, interactivity will not be reached, because results will also come in a bundle: for graphical interactivity, the result must obviously be pipelined with visualization; in the test-debug-correct cycle, there might be not very many jobs to run. With respect to grid management, an interactivity situation translates into a QoS requirement: just as video rendering or music playing requires special scheduling on a personal computer, or video streaming requires network differentiated services, servicing SDJ requires a specific grid guarantee, namely a small bound on the makespan, which is usually known as a deadline in the framework of QoS. The overhead has two components: first the queuing time, and second the cost of traversal of the middleware protocol stack. The first issue is related to the grid scheduling policy, while the second is related to grid scheduling implementation. 3. A Scheduling Policy for SDJ Deadline scheduling usually relies on the concept of breaking the allocation of resources into quanta, of time for a processor, or through packet slots for network routing. For job scheduling, the problem is a priori much more difficult, because jobs are not partitionable: except for checkpointable jobs, a job that has started running cannot be suspended and restarted later. Condor has pioneered migration-based environments, which provide such a feature transparently, but deploying constrained suspension in EGEE would be much too invasive, with respect to existing middleware. Thus, SDJ should not be queued at all, which seems to be incompatible with the most basic mechanism of grid scheduling policies. The EGEE scheduling policy is largely decentralized: all queues are located on the sites, and the actual time scheduling is enacted by the local schedulers. Most often, these schedulers do not allow time-sharing (except for monitoring). The key for servicing SDJ is to allow controlled time-sharing, which transparently leverages the kernel multiplexing to jobs, through a combination of processor virtualization and slot permanent reservation. The SDJ scheduling system has two components. - A local component, composed of dedicated single-entry queues and a configuration of the local scheduler. Technical details for can be found at http://egee- na4.ct.infn.it/wiki/index.php/ShortJobs. It ensures the followig properties: the delay incurred by batch jobs is at most doubled; the resource usage is not degraded, eg by idling processors; and finally the policies governing resource sharing (VOs, EGEE and non EGEE users,...) are not impacted. - A global component, composed of job typing and mapping policy at the broker level. While it is easy to ensure that SDJ are directed to resources accepting SDJ, LCG and gLite do not provide the means to prevent non-SDJ jobs from using the SDJ queues, and this requires a minor modification of the broker code. It must be noticed that no explicit user reservation is required: seamless integration also means that explicit advance reservation is no more applicable than it would be for accessing a personal computer or a video-on-demand service. In the most frequent case, SDJ will run with under the best effort Linux scheduling policy (SCHED_OTHER); however, if hard real-time constraints must be met, this scheme is fully compatible with preemption (SCHED_FIFO or SCHED_RR policies). In any case, the limits on resource usage(e.g. as enforced by Maui) implement access control; thus the job might be rejected. The WMS notifies rejection to the application, which could decide on the most adequate reaction, for instance submission as a normal job or switching to local computation. 4. User-level scheduling Recent reports (gLite WMS Test) show impressively low middleware penalty, in the order of a few seconds, which should be available in gLite3.0. It also hints that the broker is not too heavily impacted by many simultaneous access. However, for ultra-small jobs, with execution time of the same order (XXSDJ), even this penalty is too high. Moreover, the notification time remains in the order of minutes. In the gPTM3D project, we have shown that an additional layer of user-level scheduling provides a solution which is fully compatible with EGEE organization of sharing. The scheduling and execution agents are quite different from those in Dirac: they do not constitute a permanent overlay, but are launched just as any LCG/gLite job, namely an SDJ job; moreover, they work in connected mode, more like glogin-based applications. Besides this particular case, an open issue is the internal SDJ scheduling. Consider for instance a portal, where many users ask for a continuous stream of execution of SDJ (whether XXSDJ or regular SDJ). The portal could dynamically launch such scheduling/worker agents and delegate to them the implementation of the so-called (period, slice) model used in soft real-time scheduling.
    
    Speaker: Cecile Germain-Renaud (LRI and LAL)
    
    Slides
  - 14:30
    
    Real time computing for financial applications 30m
    
    Computing grids are quite attractive for large scale financial applications: this is especially evident in the segment of dynamic financial services, where applications must complete complex tasks within strict deadlines. The traditional response has been to over-provision for making sure there is plenty of ’headroom’ in resource availability, thereby maintaining large computational resources booked and unused with a great cost in terms of infrastructure. Moreover nowadays some of these complex tasks need an amount of computing power that is unfeasible to keep in house. Computing grids can deliver the amounts of power needed in such a scenario, but there are still large limitations to overcome. In this brief report we address the solution we developed to provide real time computing power through the EGRID facility for a test case financial application. The test case we consider is an application that estimates the sensitivities of a set of stocks to specific risk factors: technical details about the procedure can be found elsewhere; we will present here only the computational details of the application to better define the problem we faced, and the solutions adopted for porting it to the grid. We implemented different technical solutions for our application in a sort of trial and error fashion. We will present briefly all of the attempts. All implemented solutions rely on a “job reservation mechanism”: we allocate grid resources in advance to eliminate latency due to the job submission mechanism. In this way, as soon as we get enough resources allocated we can interact with them in real time. The drawback is that being an advanced booking strategy, for “best effort” services this approach could be unfeasible. It is not the case for this experimental work though, but the limitation should be taken into account when approaching production runs. The booking mechanism has been implemented in the following way. An early submission of a bunch of jobs is run for securing the availability of WN at a given time. Each pooled node will executes a program that regularly checks a host (usually the UI, but not necessarily). The contacted host enrolls this WN for the user’s program, as soon as the user executes that program. When the execution terminates the results are available in real time without any delay introduced by WMS of the grid. The WNs remain booked, and so are ready to be enrolled again for other program executions; eventually they are freed by the user. This approach, where the WN asks to be enrolled in a computation thereby acting as a client, is needed because the WN cannot be reached directly from the UI.
    
    Speaker: Dr Stefano Cozzini (CNR-INFM Democritos and ICTP)
    
    Slides
  - 15:00
    
    Grid-Enabled Remote Instrumentation with Distributed Control and Computation 30m
    
    1 GRIDCC Applications and Requirements The GRIDCC project [1], sponsored by the European Union under contract number 511381, and launched in September 2004, endeavors to integrate scientific and general-purpose instruments within the Grid. The motivation is to exploit the Grid opportunities for secure, collaborative work of distributed teams and to utilize the Grid’s massive memory and computing resources for the storage and processing of data generated by scientific equipment. The GRIDCC project focuses its attention on eight applications, four of which will be fully integrated, tested and deployed on the Grid. The PowerGrid will support the remote monitoring and control of thousands of small power generators; while the Control and Monitoring of HEP experimentsaims to enable remote control and monitoring of the CMS detector at CERN. The (Far) Remote Operation of Accelerator Facility is an application for the full operation of a remote accelerator in Trieste, Italy; and the Grid-based Intrusion Detection System aims to provide detection and trace-back of flow-based DoS attacks using aggregated data collected from multiple routers. The other set of relevant applications includes: meteorology, neurophysiology, handling of device farms for measurements in telecommunications laboratories, and geophysiology [2][5]. The project, by nature, requires the availability of software components that allow for time-bounded and secure interactions, while operating instrumentation in a collaborative environment. In addition to the classical request/response Grid service interaction model, a considerable amount of information needs to be streamed from the instrument back to the user. The time-bounded interactions, dictated either by the instrument sensitivity and the accompanying requirement for careful handling and fast response to extreme conditions, or by the applications themselves, lead to the need for the establishment of SLAs for QoS or other guarantees, with support for compensation and rollback. The idea of collaboration and resource sharing, inherent in the Grid, is also extended and adapted to allow the share of unique instruments among users who are geographically dispersed, and who normally would not have access to such – usually rare and/or expensive – equipment. 2 GRIDCC and gLite To cater for the diversity of instruments and the critical nature of the equipment being handled, the GRIDCC middleware platform relies on Web Service (WS) technologies, and sustains a Service Level Agreement (SLA) infrastructure, alongside enforcement of Quality of Service (QoS) guarantees. The GRIDCC middleware architecture is fully described in [2]. A number of gLite software components are extremely relevant to the GRIDCC middleware architecture, which is designed to comprise various novel middleware components to complement them. Firstly, we plan to perform job scheduling and bookkeping via the WMS and specifically the WMProxy, and the LBProxy [2]. We also plan to rely on the Agreement Service for SLA signalling and for triggering resource-level reservations [2]– this is essential to enforce SLA guarantees. In addition, we plan to test and possibly extend CREAM, as explained in the following Section. The WSDL interface, exposed by the gLite WMS, streamlines job submission in a number of different scenarios: direct invocation by the Virtual Control Room (VCR) - the GRIDCC portal; direct submission onto preselected CEs via the GRIDCC Workflow Management System (WfMS); and indirectly, utilising the WMS’s builtin scheduling capabilities, either as a single submission or part of a workflow [2]. The WfMS and VCR are described in more detail in Section 3. Data gathered from IEs need to be stored, in MSS sevices. Consequently, data storage will be delegated to gLite SEs exposing SRM-compliant interfaces. VOMS and proxy-renewal services will be used. For authentication and authorization, it is foreseen to support both X.509 certificates and the Kerberos framework. The latter will be used when low response times are required. Finally, for QoS performance monitoring, as it is experienced by GRIDCC users and services, we require the integration of service monitoring tools and services providing information about network performance, such as the gLite Network performance Monitoring framework. 3 GRIDCC Middleware The gap between GRIDCC’s requirements and gLite’s existing service support, will be filled by a number of GRIDCC solutions, which leverage the existing gLite functionality. The need for instrument support, necessitated the development of a new grid component, the Instrument Element (IE). The IE’s naming and design reflect its similarity to gLite’s SE and CE. The IE provides a Grid interface to a physical instrument or set of instruments, and should allow the user to control and access instrument data [2]. To cater for the varied needs of instrumentation, the IE also has local automated management and storage capacity [2]. The desire for QoS and SLA support is provided for by the following Execution Service components. The gLite AS will be extended to establish SLAs with the IE, and the IE will need to enforce such SLAs. To achieve this, the IE conceptual model and schema need to be defined in order to publish information about the instrument- specific properties. The GRIDCC Workflow Management System (WfMS) provides an interface for users to submit workflows, which can orchestrate WS calls to underlying services [3]. The WfMS may also need to choreograph further steps into workflows, such as the SLA negotiation and logging steps, to facilitate the satisfaction of, possibly complex, QoS demands from the user [3]. It is also responsible for monitoring running workflows and responding to workflow events - such as contacting a user if QoS demands can no longer be satisfied [2]. The Virtual Control Room (VCR), supports a user Grid portal for the underlying services, in particular to: request SLAs from the AS; steer and monitor an IE; and submit workflows to the WfMS [2][3][4]. Additionally, the VCR provides a multi-user collaborative online environment, wherein remote users and support staff, share control of and troubleshoot IEs [2][4]. 4 Extending gLite To fulfill the GRIDCC application requirements, a number of gLite functionality extensions would be useful for successful middleware integration. Firstly, information about IEs needs to be made available by the information services. Secondly, in order to enforce upper-bounded execution times, the reservation of CEs and IEs needs to be supported. To this end, we will extend the AS, by adding CE and IE-specific SLA templates. Reservation needs to be triggered and enforced by elements at the fabric-layer. For this reason, we envisage the addition of a new operation to the WSDL interface exposed by CREAM, allowing the invocation of reservation operations. As mentioned above, GRIDCC, needs for QoS to be enforced at both the single-task and workflow level. The WMS already supports some workflow functionality; however, the WMS can only process workflows involving job execution tasks. We foresee the need to merge the functionality of the GRIDCC WfMS with the gLite WMS, to benefit from the existing WMS capabilities and avoid duplication of work. References [1] The GRIDCC Project home page: http://www.gridcc.org. [2] The GRIDCC Architecture – Architecture of Services for a Grid Enabled Remote Instrument Infrastructure (http://www.gridcc.org/getfile.php?id=1382). [3] D4.1 Basic Release R1, GRIDCC Project Deliverable GRIDCC-D4.1, May 2005 (https://ulisse.elettra.trieste.it/tutos_gridcc/php/file/file_show.php?id=1418) [4] Multipurpose Collaborative Environment, GRIDCC Project Deliverable GRIDCC- D5_2, Sept 2005 (https://ulisse.elettra.trieste.it/tutos_gridcc/php/file/file_show.php?id=1408) [5] SPECIFIC TARGETED RESEARCH OR INNOVATION PROJECT – Annex I - “Description of Work”, May 2004 (http://www.gridcc.org) [6] EGEE Middleware Architecture and planning, EGEE Project, Deliverable EGEE- DJRA1.1-594698-v1.0, Jul 2005 (https://edms.cern.ch/document/594698/).
    
    Speaker: Luke Dickens (Imperial College)
    
    Slides
  - 15:30
    
    Efficient job handling in the GRID: short deadline, interactivity, fault tolerance and parallelism 30m
    
    The major GRID infastructures are designed mainly for batch-oriented computing with coarse-grained jobs and relatively high job turnaround time. However many practical applications in natural and physical sciences may be easily parallelized and run as a set of smaller tasks which require little or no synchronization and which may be scheduled in a more efficient way. The Distributed Analysis Environment Framework (DIANE), is a Master-Worker execution skeleton for applications, which complements the GRID middleware stack. Automatic failure recovery and task dispatching policies enable an easy customization of the behaviour of the framework in a dynamic and non-reliable computing environment. We demonstrate the experience of using the framework with several diverse real-life applications, including Monte Carlo Simulation, Physics Data Analysis and Biotechnology. The interfacing of existing sequential applications from the point of view of non-expert user is made easy, also for legacy applications. We analyze the runtime efficiency and load balancing of the parallel tasks in various configurations and diverse computing environments: GRIDs (LCG, Crossgrid), batch farms and dedicated clusters. In practice, the usage of ther Master/Worker layer allows to dramatically reduce the job turnaround time, a scenario suitable for short deadline jobs and interactive data analysis. Finally it is also possible to easily introduce more complex synchronization patterns, beyond trivial parallelism, such as arbitrary dependency graphs (including cycles, in contrast to DAGs) which may be suitable for bio-informatics applications.
    
    Speaker: Mr Jakub MOSCICKI (CERN)
    
    Slides
  - 16:00
    
    Coffee break 30m
  - 16:30
    
    Grid Computing and Online Games 30m
    
    With the fast growth of the video games and entertainment industry - thanks to the appearance of new games, new technologies and innovative hardware devices - the capacity to react becomes critical for competing in the market of services and entertainment. Therefore it is necessary to be able to count on advanced middleware solutions and technological platforms that allow a fast unfolding of custom made services. Andago has developed the online games platform Andago Games that provides the technological base necessary for the creation of online Games services around which the main entertainment sites will be able to establish solid business models. The platform Andago Games allows to quickly create online multiplayer games channels with the following services for the final user: * Pay per Play/ pay per subscription * Reserving of gaming rooms or servers and advance management of games * Advanced statistics * Automatic game launch * Clans * Championships, downloads, chat, etc. However, the platform requires important investments by operators and portals, limiting the number of possible customers. Grid computing will reduce dramatically the amount of these investments by means of sharing resources among different operators and portals. Also, Grid computing offers the possibility to create virtual organizations, where operators and portals could share games and contents, and even their user’s base. Technically, the goal is to be able to share expensive resources between providers and to allow billing based on usage. From a business perspective our goal is to open new commercial opportunities in the domain of entertainment. A common problem with online games is that operators, portals and games providers would like to share resources and aim at sharing the costs to optimize their businesses. Yet business entities are generally required to play all business roles. The European market is still too fragmented and it is hard to reach the critical mass of users needed to make online games businesses profitable and to ensure resource liquidity. Having a Grid infrastructure makes it possible to divide tasks among different actors and in consequence each actor could concentrate on the business it knows best. Application developers provide the applications, portal providers create the portals to attract users, and Telcos/ISP will provide the infrastructure required. Such Virtual Organisations allow for profitable alliances and resource integration. The outcome of a grid enabled online games platform will be to provide the middleware to make this collaboration happen. The Grid ensures not only decreasing costs for businesses, but allows for creating a global European market as applications, infrastructure and users can be shared independently of political and social borders, smoothly integrated and better exploited. There are also big advantages for users. For example, they will have a larger offer, better quality of service and certainly cheaper services. Grid centralized portals would provide thousands of games and entertainment content from different providers. Today, if one buys a new game and wants to play it online, the user has to connect to a server (possibly) in the USA, unless a local server was set up. Having a Grid infrastructure would largely ease that process. Users will simply connect to the Grid, play and join the international community of users. An online games scenario implies strong requirements on QoS for the provision in real-time of distributed multimedia content all over the world. Also usage monitoring is quite important due to the user profiling and its matching with the content (underage access to inadequate contents). Privacy, billing and community building are other properties relevant for online games and entertainment.
    
    Speaker: Mr Rafael Garcia Leiva (Adago Ingenieria)
    
    Slides
  - 17:00
    
    User Applications of R-GMA 30m
    
    The Relational Grid Monitoring Architecture (R-GMA) provides a uniform method to access and publish both information and monitoring data. It has been designed to be easy for individuals to publish and retrieve data. It provides information about the grid, mainly for the middleware packages, and information about grid applications for users. From a user's perspective, an R-GMA installation appears as a single virtual database. R-GMA provides a flexible infrastructure in which producers of information can be dynamically created and deleted and tables can be dynamically added and removed from a schema. All of the data that is published has a timestamp, enabling its use for monitoring. R-GMA is currently being used for job monitoring, application monitoring, network monitoring, grid FTP monitoring and the site functional tests (SFT). R-GMA is a relational implementation of the Global Grid Forum's (GGF) Grid Monitoring Architecture (GMA). GMA defines producers and consumers of information and a registry that knows the location of all consumers and producers. R-GMA provides Consumer, Producer, Registry and Schema services. The consumer service allows the user to issue a number of different types of query: history, latest and continuous. History queries are queries over time sequenced data and latest queries correspond to the intuitive idea of current information. For a continuous query, new data are broadcast to all subscribed consumers as soon as those data are published via a producer. Consumers are automatically matched with producers of the appropriate type that will satisfy their query. Data published by application code is stored by a producer service. R-GMA provides a producer service that includes primary and secondary producers. Primary producers are the initial source of data within an R-GMA system. Secondary producers can be used to republish data in order to co-locate information to speed up queries (and allow multi-table queries), to reduce network traffic and to offer different producer properties. It is envisaged that there will be numerous primary producers and one or two secondary producers for each subset of data. Both primary and secondary producers may use memory or a database to store the data and may specify retention periods. Memory producers give the best performance for continuous queries, whereas database producers give the best performance where joins are required. It is not necessary for users to know where other producers and consumers are: this is managed by the local producer and consumer services on behalf of the user. In most cases it is not even necessary to know the location of the local producer and consumer services, as worker nodes and user interface nodes are already configured to point to their local R-GMA producer and consumer services. There are already a number of applications using R-GMA. The first example is job monitoring. There was a requirement to allow grid users to monitor the progress of their jobs and for VO administrators to get an overview of what was happening on the grid. The problems were that the location in which a grid job would end up was not known in advance, and that worker nodes were behind firewalls so they were not accessible remotely. SA1 has adopted the job wrapper approach, as this did not require any changes to the application code. Every job is put in a wrapper that periodically publishes information about the state of the process running the job and its environment. These data are currently being published via the SA1 JobMonitoring table within R-GMA. A second application has been written to run on the resource broker nodes. This application examines the logging and bookkeeping logs and publishes data about the changes in state of grid jobs. These data are made available via the SA1 JobStatusRaw table. Both the producer in the job wrapper and the producers on the resource broker nodes make use of R-GMA memory primary producers. A database secondary producer is used to aggregate the data. Other uses of R-GMA include application monitoring, network monitoring and gridFTP monitoring. There are a number of different ways to implement application monitoring including the wrapper approach, as the job monitoring, and instrumentation of the application code. Instrumentation of the code can mean using a logging service, e.g. log4j, which publishes data via R-GMA, or calling R-GMA API methods directly from the application code. The network monitoring group, NA4, have been using R-GMA to publish a number of network metrics. They used memory primary producers in the network sensors to publish the data and a database secondary producer to aggregate the data. SA1 have made use of the consumer service for monitoring grid FTP metrics. They have written a memory primary producer that sits on the gridFTP server nodes and publishes statistics about the file transfers. A continuous consumer is used to pull in all the data to a central location, from where it is written to an Oracle database for analysis. This was used for Service Challenge 3. Two patterns have emerged from the use made of R-GMA for monitoring. In both patterns data is initially published using memory primary producers. These may be short lived and only make the data available for a limited time, e.g. the lifetime of a grid job. In one pattern data are made persistent by using a consumer to populate an external database which applications query directly. In the other pattern, an R-GMA secondary producer is used to make the data persistent and also make it available for querying through R-GMA. In the coming months we plan to add support for multiple Virtual Data Bases, authorization within the context of a Virtual Data Base using VOMS attributes, registry replication, load balancing over multiple R-GMA servers and support for Oracle. R-GMA is an information and monitoring system that has been specifically designed for the grid environment. It can be used by systems, VOs and individuals and is already in use in production.
    
    Speaker: Dr Steve Fisher (RAL)
    
    Slides
  - 17:30
    
    Final discussion on the session topics 1h
Friday, 3 March
- 13:00 → 14:00
  
  Lunch 1h

Choose timezone

EGEE User Forum

CERN

40/4-C01

CERN

Share this page

Direct link

Social networks

Calendaring