EGEE User Forum

Europe/Zurich
CERN

CERN

Description

The EGEE (Enabling Grids for E-sciencE) project provides the largest production grid infrastructure for applications. In the first two years of the project an increasing number of diverse users communities have been attracted by the possibilities offered by EGEE and have joined the initial user communities. The EGEE user community feels it is now appropriate to meet to share their experiences, and to set new targets for the future, including both the evolution of the existing applications and the development and deployment of new applications onto the EGEE infrastructure.

The EGEE Users Forum will provide an important opportunity for innovative applications to establish contacts with EGEE and with other user communities, to plan for the future usage of the EGEE grid infrastructure, to learn about the latest advances, and to discuss the future evolution in the grid middleware. The main goal is to create a dynamic user community, starting from the base of existing users, which can increase the effectiveness of the current EGEE applications and promote the fast and efficient uptake of grid technology by new disciplines. EGEE fosters pioneering usage of its infrastructure by encouraging collaboration between diverse scientific disciplines. It does this to evolve and to expand the services offered to the EGEE user community, maximising the scientific, technological and economical relevance of grid-based activities.

We would like to invite hands-on users of the EGEE Grid Infrastructure to Submit an Abstract for this event following the suggested template.

EGEE User Forum Web Page
Participants
  • Adrian Vataman
  • Alastair Duncan
  • Alberto Falzone
  • Alberto Ribon
  • Ales Krenek
  • Alessandro Comunian
  • Alexandru Tudose
  • Alexey Poyda
  • Algimantas Juozapavicius
  • Alistair Mills
  • Alvaro del Castillo San Felix
  • Andrea Barisani
  • Andrea Caltroni
  • Andrea Ferraro
  • Andrea Manzi
  • Andrea Rodolico
  • Andrea Sciabà
  • Andreas Gisel
  • Andreas-Joachim Peters
  • Andrew Maier
  • Andrey Kiryanov
  • Aneta Karaivanova
  • Antonio Almeida
  • Antonio De la Fuente
  • Antonio Laganà
  • Antony wilson
  • Arnaud PIERSON
  • Arnold Meijster
  • Benjamin Gaidioz
  • Beppe Ugolotti
  • Birger Koblitz
  • Bjorn Engsig
  • Bob Jones
  • Boon Low
  • Catalin Cirstoiu
  • Cecile Germain-Renaud
  • Charles Loomis
  • CHOLLET Frédérique
  • Christian Saguez
  • Christoph Langguth
  • Christophe Blanchet
  • Christophe Pera
  • Claudio Arlandini
  • Claudio Grandi
  • Claudio Vella
  • Claudio Vuerli
  • Claus Jacobs
  • Craig Munro
  • Cristian Dittamo
  • Cyril L'Orphelin
  • Daniel JOUVENOT
  • Daniel Lagrava
  • Daniel Rodrigues
  • David Colling
  • David Fergusson
  • David Horn
  • David Smith
  • David Weissenbach
  • Davide Bernardini
  • Dezso Horvath
  • Dieter Kranzlmüller
  • Dietrich Liko
  • Dmitry Mishin
  • Doina Banciu
  • Domenico Vicinanza
  • Dominique Hausser
  • Eike Jessen
  • Elena Slabospitskaya
  • Elena Tikhonenko
  • Elisabetta Ronchieri
  • Emanouil Atanassov
  • Eric Yen
  • Erwin Laure
  • Esther Acción García
  • Ezio Corso
  • Fabrice Bellet
  • Fabrizio Pacini
  • Federica Fanzago
  • Fernando Felix-Redondo
  • Flavia Donno
  • Florian Urmetzer
  • Florida Estrella
  • Fokke Dijkstra
  • Fotis Georgatos
  • Fotis Karayannis
  • Francesco Giacomini
  • Francisco Casatejón
  • Frank Harris
  • Frederic Hemmer
  • Gael youinou
  • Gaetano Maron
  • Gavin McCance
  • Gergely Sipos
  • Giorgio Maggi
  • Giorgio Pauletto
  • giovanna stancanelli
  • Giuliano Pelfer
  • Giuliano Taffoni
  • Giuseppe Andronico
  • Giuseppe Codispoti
  • Hannah Cumming
  • Hannelore Hammerle
  • Hans Gankema
  • Harald Kornmayer
  • Horst Schwichtenberg
  • Huard Helene
  • Hugues BENOIT-CATTIN
  • Hurng-Chun LEE
  • Ian Bird
  • Ignacio Blanquer
  • Ilyin Slava
  • Iosif Legrand
  • Isabel Campos Plasencia
  • Isabelle Magnin
  • Jacq Florence
  • Jakub Moscicki
  • Jan Kmunicek
  • Jan Svec
  • Jaouher KERROU
  • Jean Salzemann
  • Jean-Pierre Prost
  • Jeremy Coles
  • Jiri Kosina
  • Joachim Biercamp
  • Johan Montagnat
  • John Walk
  • John White
  • Jose Antonio Coarasa Perez
  • José Luis Vazquez
  • Juha Herrala
  • Julia Andreeva
  • Kerstin Ronneberger
  • Kiril Boyanov
  • Kiril Boyanov
  • Konstantin Skaburskas
  • Ladislav Hluchy
  • Laura Cristiana Voicu
  • Laura Perini
  • Leonardo Arteconi
  • Livia Torterolo
  • Losilla Guillermo Anadon
  • Luciano Milanesi
  • Ludek Matyska
  • Lukasz Skital
  • Luke Dickens
  • Malcolm Atkinson
  • Marc Rodriguez Espadamala
  • Marc-Elian Bégin
  • Marcel Kunze
  • Marcin Plociennik
  • Marco Cecchi
  • Mariusz Sterzel
  • Marko Krznaric
  • Markus Schulz
  • Martin Antony Walker
  • Massimo Lamanna
  • Massimo Marino
  • Miguel Cárdenas Montes
  • Mike Mineter
  • Mikhail Zhizhin
  • Mircea Nicolae Tugulea
  • Monique Petitdidier
  • Muriel Gougerot
  • Nadezda Fialko
  • Nadine Neyroud
  • Nick Brook
  • Nicolas Jacq
  • Nicolas Ray
  • Nils Buss
  • Nuno Santos
  • Osvaldo Gervasi
  • Othmane Bouhali
  • Owen Appleton
  • Pablo Saiz
  • Panagiotis Louridas
  • Pasquale Pagano
  • Patricia Mendez Lorenzo
  • Pawel Wolniewicz
  • Pedro Andrade
  • Peter Kacsuk
  • Peter Praxmarer
  • Philippa Strange
  • Philippe Renard
  • Pier Giovanni Pelfer
  • Pietro Lio
  • Pietro Liò
  • Rafael Leiva
  • Remi Mollon
  • Ricardo Brito da Rocha
  • Riccardo di Meo
  • Robert Cohen
  • Roberta Faggian Marque
  • Roberto Barbera
  • Roberto Santinelli
  • Rolandas Naujikas
  • Rolf Kubli
  • Rolf Rumler
  • Romier Genevieve
  • Rosanna Catania
  • Sabine ELLES
  • Sandor Suhai
  • Sergio Andreozzi
  • Sergio Fantinel
  • Shkelzen RUGOVAC
  • Silvano Paoli
  • Simon Lin
  • Simone Campana
  • Soha Maad
  • Stefano Beco
  • Stefano Cozzini
  • Stella Shen
  • Stephan Kindermann
  • Steve Fisher
  • tao-sheng CHEN
  • Texier Romain
  • Toan Nguyen
  • Todor Gurov
  • Tomasz Szepieniec
  • Tony Calanducci
  • Torsten Antoni
  • tristan glatard
  • Valentin Vidic
  • Valerio Venturi
  • Vangelis Floros
  • Vaso Kotroni
  • Venicio Duic
  • Vicente Hernandez
  • Victor Lakhno
  • Viet Tran
  • Vincent Breton
  • Vincent LEFORT
  • Vladimir Voznesensky
  • Wei-Long Ueng
  • Ying-Ta Wu
  • Yury Ryabov
  • Ákos Frohner
    • 1:00 PM 2:00 PM
      Lunch 1h
    • 12:30 PM 2:00 PM
      Lunch 1h 30m
    • 2:00 PM 6:30 PM
      2b: Data access on the grid 40-SS-D01

      40-SS-D01

      CERN

      • 2:00 PM
        GDSE: A new data source oriented computing element for Grid 20m
        1. The technique addressed in connection with concrete use cases In a GRID environment the main components that manages the jobs life are the Grid Resource Framework Layer, the Grid Information System Framework and the Grid Information Data Model. Since the job life is strongly coupled with its computational environment then the Grid middleware must be aware of the specific computing resources managing the job. Until now, only two types of computational resources, the hardware machines and some batch queueing systems, have been taken into account as a valid Resource Framework Layer instances. However different types of virtual computing machines exist such as the Java Virtual Machine, the Parallel Virtual Machine and the Data Source Engine (DSE). Moreover the Grid Information System and Data Model have been used for representing hardware computing machines, never considering that a software computational machine is even a resource that can be well represented. This work addresses the extension of the Grid Resource Framework Layer, of the Information System and of the Data Model so that a software virtual machine as a Data Source Engine is a valid instance for a Grid computing model, namely the so called Grid-Data Source Engine (G-DSE). Once the G-DSE has been defined, a new Grid element, namely the Query Element (QE) can be in turn defined; it enables the access to a Data Source Engine and Data Source, totally integrated with the Grid Monitoring and Discovery System and with the Resource Broker. The G-DSE has been designed and set up in the framework of the GRID.IT project, a multidisciplinary Italian project funded by the Ministry of Education, University and Research; the Italian astrophysical community participates to this project by porting on Grid three applications, one of them addressed to the extraction of data from astrophysical databases and their reduction by exploiting resources and services shared on the available INFN Grid infrastructure whose middleware is LCG based. The use case we envisaged and sketched out for this application reflects the typical way astronomers work with. Astronomers typically require to 1) discover astronomical data that reside on astronomical databases spread worldwide; this discovery process is driven through a set of metadata fully describing the data the user looks for; 2) if data are found in some archive on the network they are retrieved and processed through a suite of appropriate reduction software tools; data can also be cross-correlated with similar data residing elsewhere or just acquired by the astronomer; 3) if data the user looks for are not found, the astronomer can decide to acquire them through a set of astronomical instrumentation or generate them on the fly through proper simulation software tools; 4) at the end of the data processing phase the user typically saves the results in some database reachable on the network. In the framework of our participation to GRID.IT project we realized that the LCG Grid infrastructure based on Globus 2.4 is strongly computing centric and does not offer any mechanism to access databases in a transparent way for final users. For this reason, after having evaluated a number of possible solutions like Spitfire and OGSA-DAI, it was decided to undertake a development phase on the Grid middleware to make it able to fully satisfy our application demands. It is worth to note here that a use case like that described above is not peculiar of the astrophysical community only, rather it is applicable to other disciplines where access to data stored in complex structures like database represent a factor of key importance. Within the GRID.IT project the extended LCG Grid middleware has been extensively tested proving that the solution under development makes the Grid technology able to fully meet the requirements of typical astrophysical application. The G-DSE is currently in a prototypal state; further work is needed to refine it and bring it in a production state. Once the Grid middleware has been enhanced through the inclusion of the G-DSE, the new QE can be set up. The QE is a specialized CE able to interact, making use of G-DSE capabilities, with databases looking them as embedded resources within the Grid, like a computing resource or a disk resident file. The QE is able to process and handle complex workflows that foresee both the usage of traditional Grid resources as well as the new ones; database resources in particular may be seen and used as data repository structures and even as virtual computing machines to process data stored within them. 2. Best practices and application level tools to exploit the technique on EGEE A suite of tools are currently in the process of being designed and set up to make easy for applications to use the functionalities and capabilities of a G-DSE enabled Grid infrastructure. Such tools are mainly thought to help users in preparing the JDL scripts able to exploit the G-DSE capabilities and, ultimately, the functionalities offered by the new Grid QE. The final goal however is to offer to final users graphical tools to design and sketch out their workflows to be passed on to the QE for their analysis and processing. A precondition, obviously, to achieve these results is to have the G-DSE, and then the QE fully integrated in the Grid middleware used by EGEE. 3. Key improvements needed to better exploit this technique on EGEE The current prototype of the G-DSE is not included yet in the Grid middleware flavours the EGEE infrastructure is based on. The test phase carried out on the G-DSE prototype so far has made use of a parallel test bed Grid infrastructure set up thanks to the collaboration between INFN and INAF. Such parallel infrastructure is made of a BDII and of a RB on which the modified Grid components constituting the G-DSE have been mounted. The mandatory precondition to make use of the G-DSE, therefore is its inclusion (i.e. the modified components of the Grid middleware) in the Grid infrastructure used by EGEE. 4. Industrial relevance The G-DSE has been originally thought to solve a specific problem of a scientific community and the analysis of new application fields has been focussed so far in the scientific research area. Because G-DSE however represents a general solution to make of any database an embedded resource of the Grid, quite apart from the nature and kind of data contained within it, it is natural for the G-DSE to extend its applicability even in the field of industrial applications whenever the access to complex data structures is a crucial aspect.
        Speaker: Dr Giuliano Taffoni (INAF - SI)
        Slides
      • 2:20 PM
        Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface 20m
        Introduction AMI (ATLAS Metadata Interface) is a developing application, which stores and allows access to dataset metadata for the ATLAS experiment. It is a response to the large number of database-backed applications needed by an LHC experiment called ATLAS, all with similar interface requirements. It fulfills the need of many applications by offering a generic web service and servlet interface, through the use of self-describing databases. Schema evolution can be easily managed, as the AMI application does not make any assumptions about the underlying database structure. Within AMI data is organized in "projects". Each project can contain several namespaces (*). The schema discovery mechanism means that independently developed schemas can be managed with the same software. This paper summarises the impact of the requirements contracted to AMI of five gLite metadata interfaces. These interfaces namely MetadataBase, MetadaCatalog, ServiceBase, FASBase and MetadaSchema [1] deal with a range of previously identified use cases on dataset (and logical files) metadata by particle physicists and project administrators working on the ATLAS experiment. The future impact on AMI architecture of the VOMs security structure and the gLite search interface are both discussed. Fundamental Architecture of AMI The AMI core software can be used in a client server model. There are three possibilities for a client (software installed on client side, from a browser and web services) but the relevant client with regards to grid services is the Web Services client. Within AMI there are generic packages, which constitute the middle layer of its three-tier architecture. Command classes can be found within these packages. These classes are key to the implementation of the gLite methods in each of the interfaces. The implemented gLite interfaces are therefore situated on the server side in this middle layer and directly interface with the client tier and the command classes in this middle layer. It is possible to choose a corresponding AMI command that is equivalent to the basic requirements of each of the gLite Interface methods. [Figure 1] Figure 1: A Schematic View of the Software Architecture of AMI [2]. This diagram shows the AMI Compliant Databases as the top layer. This interfaces with the lowest software layer, which is JDBC. The middle layer BkkJDBC package allows for connection to both MySQL and Oracle. The generic packages contain command classes which are used in managing the databases. Application specific software in the outer layer can include the generic web search pages. The procedure used to further understand the structure necessary to implement the gLite methods was to observe how AMI is designed to absorb commands into its middle tier mechanism. This was achieved by mapping the delegation of methods through the relevant code and is best illustrated with the use of an UML sequence diagram in figure 2. The deployment of AMI as a web application in a web container can take place using Tomcat. To set up web services for AMI it is necessary to plug the Axis framework into Tomcat. Then with the use of WSDL and the axis tools that allow conversion from WSDL to Java client classes a Java web service client class can be deployed which communicates with the gLite interfaces. (*) namespace is "database" in MySQL terms, "schema" in ORACLE and "file" in SQLite. [Figure 2] Figure 2: UML sequence diagram of basic workings of AMI. Note: A controller class delegates what command class is invoked. A router loader is instantiated to connect to a database. XML output is returned to the gLite interface implementation class. A direct consequence of grid services is secure access. This involves authentication and authorisation of users and machines. Authorisation in AMI is handled by a local role-based mechanism. Authentication is implemented by securing the web services using grid certificates. Currently permissions in AMI are based on a local role system. An EGEE wide role system called Virtual Organizations Membership Service (VOMS) [3] is being developed. AMI would then have to be set up to read and understand VOMS attributes and grant permissions based on a user's role in ATLAS. Requirements analysis work is currently underway on the impact of this VOMS system on the AMI architecture. Also directly relevant to the gLite interface was the implementation of a query language for performing cascaded searches through all projects. This implementation used a library (JFLEX) to define our own grammar rules, following the EGEE gLite Metadata Query Language (MQL) specification. It allows AMI to execute a search in a generic way on several databases of any type (MySQL, ORACLE or SQLite for example) starting only from one MQL query. Conclusion This paper presents a description of the implemention of the gLite Interfaces for AMI. It summarises how AMI was set up fully with these implementation classes interfacing with web service clients and how these clients are made secure with the aid of grid certificates. AMI as mentioned provides a set of generic tools for managing database applications. AMI also supports geographical distribution with the use of web services. To implement the gLite interfaces as a wrapper to AMI using these web services provides the user with a generic and secure metadata interface. Along with the gLite search interface, any third party application should be able to plug in AMI knowing it supports a well defined API. References [1] Developer's Guide for the gLite EGEE Middleware - http://edms.cern.ch/document/468700 [2] ATLAS Metadata Interfaces (AMI) and ATLAS Metadata Catalogs, Solveig Albrand, Jerome Fulachier, LPSC Grenoble [3] VOMs - http://hep-project-grid-scg.web.cern.ch/hep-project-grid-scg/voms.html
        Speaker: Mr Thomas Doherty (University of Glasgow)
        Slides
      • 2:40 PM
        The AMGA Metadata Service 20m
        We present the ARDA Metadata Grid Application (AMGA) which is part of the gLite middleware. AMGA provides a lightweight service to manage, store and retrieve simple relational data on the grid, termed metadata. In this presentation we will first give an overview of AMGA's design, functionality, implementation and security features. AMGA was designed in close collaborations with the different EGEE user communities and combines high performance, which was very important to the high energy physics community, with fine-grained access restrictions required in particular by the BioMedical community. These access restrictions also make full use of the EGEE VOMS services and are based on grid certificates. To show to what extent the users' requirements have been met, we will present performance measurements as well as show uses-cases for the security features. Several applications are currently using AMGA to store their metadata. Among them are the MDM (Medical Data Manager) application implemented by the BioMedical community, the GANGA physics analysis tool from the Atlas and LHCb experimens and a Digital Library from the generic applications. The MDM application uses AMGA to store relational information on medical images stored on the grid plus information on patients and doctors in several tables. User applications can retrieve images baded no their metadata for further processing. Access restrictions are of the highest importance to the MDM application because the stored data is highly confidential. MDM therefore makes use of the fine-grained access restrictions of AMGA. The GANGA application uses AMGA to store the status information of jobs running on the grid which can be controlled by GANGA. AMGA's simple relational database features are mainly used to ensure consistency when several GANGA clients of the same user are accessing the stored information remotely. Finally, the Digital Library project makes similar use of AMGA as the MDM application but provides many different schemas to store not only images but information on texts, movies or music. Another difference is that there is only a central librarian updating the library while for MDM updates are triggered by the many image acquisition systems themselves. This presenation will also discuss future developments of AMGA, in particular its features to replicate or federate metadata. They will mainly allow users to make use of a better scaling behaviour but could also allow better security by using federation to physically seperate metadata. The replication features will be compared to current proprietary solutions. AMGA provides a very lightweight metadata service as well as basic database access functionality on the Grid. After a brief overview of AMGA's design, functionality, implementation and security features we will show performance comparisons of AMGA with direct database access as well as other Grid catalogue services. Finally the replication features of AMGA are presented and a comparison done with proprietary database replication solutions.
        Speaker: Dr Birger Koblitz (CERN-IT)
        Slides
      • 3:00 PM
        Use of Oracle software in the CERN Grid 20m
        Oracle is known as a database vendor, but has much more to offer than data storage solutions. Some key Oracle products that are in use or are being currently full-scale tested at CERN will be discussed in this talk. It will primarily be an open discussion and interactive feedback from the audience is more than welcome The following topics will be discussed: Oracle Client Software distribution How can a large to huge number of systems be given easy possibility to connect to Oracle database servers; what are the distribution rights and how is it actually distributed and configured. Oracle Support for Linux Oracle officially supports those Linux distributions that are in widespread use and strongly recommends that servers are being run on supported distributions. This does however not imply, that other Linux distributions cannot at all be used. This talk will elaborate on this. Oracle Streams Replication The various possibilities for using Oracle Streams to replication large amounts of data will be discussed.
        Speaker: Bjorn Engsig (ORACLE)
      • 3:20 PM
        Discussion 40m
        Discussion on metadata catalogues
      • 4:00 PM
        break 25m
      • 4:25 PM
        The gLite File Transfer Service 20m
        In this paper we describe the architecture and implementation of the gLite File Transfer Service (FTS) and list the most basic deployment scenarios. The FTS is addressing the need to manage massive wide-area data transfers on dedicated network channels while allowing the involved sites and users to manage their policies. The FTS manages the transfers in a robust way, allowing for an optimized high throughput between storage systems. The FTS can be used to perform the LHC Tier-0 to Tier-1 data transfer as well as the Tier-1 to Tier-2 data distribution and collection. The storage system peculiarities can be taken into account by fine-tuning the parameters of the FTS managing a particular channel. All the manageability related features as well as the interaction with other components that form part of the overall service are described as well. The FTS is also extensible so that particular user groups or experiment frameworks can customize its behavior both for pre- and post-transfer tasks. The FTS has been designed based on the experience gathered from the Radiant service used in Service Challenge 2, as well as the CMS Phedex transfer service. The first implementation of the FTS was put to use in the beginning of the Summer 2005. We report in detail on the features that have been requested following this initial usage and the needs that the new features address. Most of these have already been implemented or are in the process of being finalized. There has been a need to improve the manageability aspect of the service in terms of supporting site and VO policies. Due to different implementations of specific Storage systems, the choice between 3rd party gsiftp transfers and SRM-copy transfers is nontrivial and was requested as a configurable option for selected transfer channels. The way the proxy certificates are being delegated to the service and are used to perform the transfer, as well as how proxy renewal is done has been completely reworked based on experience. A new interface has been added to enable administrators to perform management directly by contacting the FTS, without the need to restart the service. Another new interface has been added in order to deliver statistics and reports to the sites and VOs interested in useful monitoring information. This is also presented through a web interface using javascript. Stage pool handling for the FTS is being added in order to allow pre-staging of sources without blocking transfer slots on the source and also to allow the implementation of back-off strategies in case the remote staging areas start to fill up. The reliable transport of data is one of the cornerstones for distributed systems. The transport mechanisms have to be scalable and efficient, making optimal usage of the available network and storage bandwidth. In production grids the most important requirement is robustness, meaning that the service needs to be run over extended periods of time with little supervision. Moreover, the transfer middleware has to be able to apply policies for failure, adapting parameters dynamically or raising alerts where necessary. In large Grids, we have the additional complication of having to support multiple administrative domains while enforcing local site policies. At the same time, the Grid application needs to be given uniform interface semantics independent of site-local policies. There are several file transfer mechanisms in use today in Data Grids, like http(s), (s)ftp , scp or bbftp, but probably the most commonly used one is GridFTP, providing a highly performant secure transfer service. The Storage Resource Manager SRM interface, which is being standardized through the Global Grid Forum, provides a common way to interact with a Storage Element, as well as a data movement facility, called SRM copy, which in most implementations will again make use of GridFTP to perform the transfer on the user's behalf between two sites. The File Transfer Service is the low level point to point file movement service provided by the EU-funded Enabling Grids for E-SciencE (EGEE) project's gLite middleware. It has been designed in order to address the challenging requirements of a reliable file transfer service in production Grid environments. What distinguishes the FTS from other reliable transfer services is its design for policy management. The FTS can also act as the resource manager's policy enforcement tool for a dedicated network link between two sites as it is capable of managing the policies of the resource owner as well as of the users (the VOs). The FTS has dedicated interfaces to manage these policies. The FTS is also extensible; upon certain events user-definable functions can be executed. The VOs may make use of this extensibility point to call upon other services when transfers complete (e.g. register replicas in catalogs) or to change the policies for certain error handling operations (e.g. the retry strategy). The LHC Computing Project (LCG) is the project that has built and maintains a data storage and analysis infrastructure for the entire high energy physics community of the Large Hadron Collider (LHC), the largest scientific instrument on the planet located at CERN. The data from the LHC experiments will be distributed around the globe, according to a multi-tiered model, where CERN is the "Tier-0", the centre of LCG. The goal of LCG Service Challenges is to provide a production quality environment where services are run for long periods with 24/7 operational support. These services include the Network and Reliable File Transfer services. In Summer 2005 Service Challenge 3 started with gLite File Transfer Service and CMS Phedex. The gLite FTS benefited from this collaboration and from the experience of prototype LCG Radiant Service, used in Service Challenge 2. This meant that from the beginning its design took into account all the requirements imposed by a production Grid infrastructure. The continuous interaction with the experiments was useful in order to react quickly to reported problems, as well as to keep the development focused on real use cases.
        Speaker: Mr Paolo Badino (CERN)
        Slides
      • 4:45 PM
        Encrypted Data Storage in EGEE 20m
        The medical community is routinely using clinical images and associated medical data for diagnosis, intervention planning and therapy follow-up. Medical imaging is producing an increasing number of digital images for which computerized archiving, processing and analysis are needed. Grids are promising infrastructures for managing and analyzing the huge medical databases. Given the sensitive nature of medical images, practiotionners are often reluctant to use distributed systems though. Security if often implemented by isolating the imaging network from the outside world inside hospitals. Given the wide scale distribution of grid infrastructures and their multiple administrative entities, the level of security for manipulating medical data should be particularly high. In this presentation we describe the architecture of a solution, the gLite Encrypted Data Storage (EDS), which was developed in the framework of Enabling Grids for E-sciencE (EGEE), a project of the European Commission (contract number INFSO--508833). The EDS does enforce strict access control to any medical file stored on the grid. It also provides file encryption facilities, that ensure the protection of data sent to remote storage, even from their administrator. Thus, data are not only transferred but also stored encrypted and can only be decrypted in host memory by authorized users. Introduction ============ The basic building blocks of the grid data management architecture are the Storage Elements (SE), which provide transport (e.g. gridftp), direct data access (e.g. direct file access, rfio, dcap) and administrative (Storage Resource Management, SRM) interfaces for a storage system. However the most widely adopted standard today for managing medical data in clinics is DICOM (Digital Image and COmmunication in Medicine). The simplified goal is to secure the data movement among these blocks, and the client hosts, which actually process the data. Challenges ========== Here we describe the most important challenges and requirements of the medical community and how they are addressed by EDS on the current grid infrastructure. Access Control -------------- The most basic requirement is to restrict the access to any data, which is on the grid, to permitted users. Although it looks like a simple requirement, the distributed nature of the architecture and the limitations of the building blocks required some work to satisfy the requirements. The first problem faced is the complex access patterns of the medical community. It is usually not enough to define a single user or group which is allowed to access the file, but instead access is needed by a list of users. The solution is to use Access Control Lists (ACLs), instead of basic POSIX permission bits, however most of the currently deployed Storage Elements do not provide ACLs. To solve the semantical mismatch, we "wrapped" the existing Storage Elements into a service, which enforced the access control settings, according to the medical community's requirements. This service is called the gLite I/O server, which is installed beside every used storage element. The gLite I/O server provides a POSIX like file access interface to remote clients, and uses the direct data access methods of the Storage Element to access the data. It authenticates the clients and enforces authorization decisions (i.e. if the client is allowed to read a file or not), so it acts like a Policy Enforcement Point in the middle of the data access. The authorization decision is not made inside the gLite I/O server. A separate service holds the ACLs (and other file metadata) of every file stored in the Storage Elements. In our deployment it was the gLite File and Replica Management (FiReMan) service, which acts like a Policy Decision Point in the architecture. The gLite FiReMan service is a central component, which also acts like a file catalog (directory functionality), replica manager (which file has a copy on a given SE) and file authorization server (if a given client is allowed to access a file). The gLite FiReMan service supports rich ACL semantics, which satisfy the access pattern requirements of the medical community. Encryption ---------- The other important requirement is privacy: the sensitive medical data shall not be stored on any permanent storage or transferred over the network unencrypted, outside the originating hospital. The solution is to encrypt every file, when it leaves the originating hospital's DICOM server, and decrypt it only inside the authorized client applications. For the first step we developed a specialized Storage Element, the Medical Data Manager (MDM) service, which "wraps" the hospital's DICOM server and offers interfaces, which are compatible with other grid Storage Elements. In this way the hospital's data storage will look like just another Storage Element, for which we already have grid data managements solutions. Despite the apparent similarity between the MDM service and an ordinary Storage Element there is an important difference: the MDM service serves only encrypted files. When a file is accessed through the grid interfaces, the service generates a new encryption key, encrypts the file and registers the key in a key store. Therefore every file which crosses the external network and is stored on stored on an external element stays encrypted during its whole lifetime. On the client side we provided a transparent solution to decrypt the file: on top of the gLite I/O client libraries, we developed a client library, which can retrieve keys from t he key storage and decrypt files on the fly. The client side library provides a POSIX like interface, which hides the details of the remote data access, key retrieval and decryption. The key storage had to satisfy several requirements: it has to be reliable, secure and provide fine grained access control for the keys. To satisfy these requirements we developed the gLite Hydra KeyStore. To satisfy reliability the keys are not only stored at one place, but at least at two locations. To satisfy security, one service cannot store a full key, but only a part of it, thus even when the service is compromised the keys cannot be fully recovered. We implemented Shamir's Secret Sharing Scheme inside the client library to split and distribute the keys among at least three Hydra services, according to the above mentioned requirements. The key storage also has to provide fine grained access control, similar to the files, on the keys. Our current solution actually applies the same ACLs as the FiReMan service, thus one can be sure that only those who can access the encryption key of a file are allowed to access the file itself. Conclusion ========== The solution for encrypted storage described above has been already released in the gLite software stack and been deployed and demonstrated to work at a number of sites. As the underlying software stack of the grid evolves we will also adapt our solution to exploit new functionality and to simplify our additional security layer.
        Speaker: Ákos Frohner (CERN)
        Slides
      • 5:05 PM
        Use of the Storage Resource Manager Interface 20m
        SRM v2.1 features and status ---------------------------- Version 2.1 of the Storage Resource Manager interface offers various features that are desired by EGEE VOs, particularly HEP experiments: pinning and unpinning of files, relative paths, (VOMS) ACL support, directory operations, global space reservation. The features are described in the context of actual use cases and availability in the following widely used SRM implementations: CASTOR, dCache, DPM. The interoperability of the different implementations and SRM versions is discussed, along with the absence of desirable features like quotas. Version 1.1 of the SRM standard is in widespread use, but has various deficiencies that are addressed to a certain extent by version 2.1. The two versions are incompatible, necessitating clients and servers to maintain both interfaces, at least for a while. Certain problems will only be dealt with in version 3, whose definition may not be completed for many months. There are various implementations of versions 1 and 2, developed by different collaborations for different user communities and service providers, with different requirements and priorities. In general a VO will have inhomogeneous storage resources, but a common SRM standard should make them compatible, such that data management tools and procedures need not bother with the actual types of the storage facilities.
        Speaker: Maarten Litmaath (CERN)
      • 5:25 PM
        Discussion 15m
        Discussion on grid data management
      • 5:40 PM
        Space Physics Interactive Data Resource - SPIDR 20m
        SPIDR (Space Physics Interactive Data Resource) is a de facto standard data source on solar-terrestrial physics, functioning within the framework of the ICSU World Data Centers. It is a distributed database and application server network, built to select, visualize and model historical space weather data distributed across the Internet. SPIDR can work as a fully-functional web-application (portal) or as a grid of web-services, providing functions for other applications to access its data holdings. Currently SPIDR archives include geomagnetic variations and indices, solar activity and solar wind data, ionospheric, cosmic rays, radio-telescope ground observations, telemetry and images from NOAA, NASA, and DMSP satellites. SPIDR database clusters and portals are installed in the USA, Russia, China, Japan, Australia, South Africa, and India. SPIDR portal combines functionality from the central XML metadata repository with two levels of metadata, descriptive and inventory, with a set of distributed data source web services, web map services, and raw observations data files collections. A user can search for data using metadata inventory, use persistent data basket to save the selection for the next session, and to plot and download in parallel the selected data in different formats, including XML and NetCDF. A database administrator can upload new files into the SPIDR databases using either the web services or the web portal. SPIDR databases are self-synchronising. User support on the portal includes discussion forum, i-mail, data basket for metadata bookmarks and selected data subsets, and usage tracking. SPIDR technology can be used for environmental data sharing, visualization and mining, not only in space physics, but also in seismology, GPS measurements, tsunami warning systems, etc. All grid data services in SPIDR share the same Common Data Model and compatible metadata schema.
        Speakers: Mr Dmitry Mishin (Institute of Physics of the Earth Russian Acad. Sci.), Dr Mikhail Zhizhin (Geophysical Center Russian Acad. Sci.)
        Slides
      • 6:00 PM
        gLibrary: a Multimedia Contents Management System on the grid 20m
        Nowadays huge amounts of information are searched and used by people from all over the world, but it is not always easy to find out what one is looking for. Search engines helps a lot, but they do not provide a standard and uniform way to make queries. The challenge of gLibrary is to design and develop a robust system to handle Multimedia Contents in a easy, fast and secure way exploiting the Grid. Examples of Multimedia Contents are images, videos, music, all kind of electronic documents (PDF, Excel, PowerPoint, Word, HTML), E-Mails and so on. New types of content can be added easily into the system. Thanks to the fixed structure of the attributes per each content type, queries are easier to perform allowing the users to choose their search criteria among a predefined set of attributes. The following are possible use examples: - A user wants to look for all the comedies in which Jennifer Aniston performed together with Ben Stiller, produced in 2004 ; or find all the songs of Led Zeppelin that last for more than 6 minutes; - An user needs to find all the PowerPoint Presentation about Data Management System in 2005 run by Uncle Sam (fantasy name); - A doctor wants to retrieve all the articles and presentations about lung cancer and download some lung X-ray images to be printed in his article for a scientific magazine; - (Google for storage) a job behaves as a “storage crawler”: it scans all the files stored in Storage Elements and publishes their related specific information into gLibrary for later searches through their attributes. Not all the users of the system have the same authority into the system. Three kind of users are enabled: gLibrary Generic Users, members of a Virtual Organization recognized by the system, can browse the library and make queries. They can also retrieve the wanted files if the submitter user authorized them; gLibrary Submitter Users can upload new entries attaching them the proper values for the defined attributes; finally gLibrary Administrator are allowed to define new content type and elect Generic User granting them submission rights. A first level of security on single file is implemented: files uploaded to Storage Elements can be encrypted using a symmetric key. This will be placed in a special directory into the system and the submitter will define which users are the rights to read it. All the application is built on top of the grid services offered by the EGEE middleware: actual data is stored in Storage Elements spread around the world, while the File Catalog keeps track of where they are located. A Metadata Catalog service is intensively used to contains the values of attributes and satisfy user’s queries. Finally, A Virtual Organization Membership Service comes in help to deal with authorization.
        Speaker: Dr Tony Calanducci (INFN Catania)
        Slides
      • 6:20 PM
        Discussion 10m
        Discussion on application data management
    • 1:00 PM 2:00 PM
      Lunch 1h