EGEE User Forum

Name: EGEE User Forum
Start: 2006-03-01T08:00:00+01:00
End: 2006-03-03T20:00:00+01:00
Location: CERN

1 Mar 2006, 08:00 → 3 Mar 2006, 20:00 Europe/Zurich

CERN

Description

The EGEE (Enabling Grids for E-sciencE) project provides the largest production grid infrastructure for applications. In the first two years of the project an increasing number of diverse users communities have been attracted by the possibilities offered by EGEE and have joined the initial user communities. The EGEE user community feels it is now appropriate to meet to share their experiences, and to set new targets for the future, including both the evolution of the existing applications and the development and deployment of new applications onto the EGEE infrastructure.

The EGEE Users Forum will provide an important opportunity for innovative applications to establish contacts with EGEE and with other user communities, to plan for the future usage of the EGEE grid infrastructure, to learn about the latest advances, and to discuss the future evolution in the grid middleware. The main goal is to create a dynamic user community, starting from the base of existing users, which can increase the effectiveness of the current EGEE applications and promote the fast and efficient uptake of grid technology by new disciplines. EGEE fosters pioneering usage of its infrastructure by encouraging collaboration between diverse scientific disciplines. It does this to evolve and to expand the services offered to the EGEE user community, maximising the scientific, technological and economical relevance of grid-based activities.

We would like to invite hands-on users of the EGEE Grid Infrastructure to Submit an Abstract for this event following the suggested template.

Participants

Adrian Vataman
Alastair Duncan
Alberto Falzone
Alberto Ribon
Ales Krenek
Alessandro Comunian
Alexandru Tudose
Alexey Poyda
Algimantas Juozapavicius
Alistair Mills
Alvaro del Castillo San Felix
Andrea Barisani
Andrea Caltroni
Andrea Ferraro
Andrea Manzi
Andrea Rodolico
Andrea Sciabà
Andreas Gisel
Andreas-Joachim Peters
Andrew Maier
Andrey Kiryanov
Aneta Karaivanova
Antonio Almeida
Antonio De la Fuente
Antonio Laganà
Antony wilson
Arnaud PIERSON
Arnold Meijster
Benjamin Gaidioz
Beppe Ugolotti
Birger Koblitz
Bjorn Engsig
Bob Jones
Boon Low
Catalin Cirstoiu
Cecile Germain-Renaud
Charles Loomis
CHOLLET Frédérique
Christian Saguez
Christoph Langguth
Christophe Blanchet
Christophe Pera
Claudio Arlandini
Claudio Grandi
Claudio Vella
Claudio Vuerli
Claus Jacobs
Craig Munro
Cristian Dittamo
Cyril L'Orphelin
Daniel JOUVENOT
Daniel Lagrava
Daniel Rodrigues
David Colling
David Fergusson
David Horn
David Smith
David Weissenbach
Davide Bernardini
Dezso Horvath
Dieter Kranzlmüller
Dietrich Liko
Dmitry Mishin
Doina Banciu
Domenico Vicinanza
Dominique Hausser
Eike Jessen
Elena Slabospitskaya
Elena Tikhonenko
Elisabetta Ronchieri
Emanouil Atanassov
Eric Yen
Erwin Laure
Esther Acción García
Ezio Corso
Fabrice Bellet
Fabrizio Pacini
Federica Fanzago
Fernando Felix-Redondo
Flavia Donno
Florian Urmetzer
Florida Estrella
Fokke Dijkstra
Fotis Georgatos
Fotis Karayannis
Francesco Giacomini
Francisco Casatejón
Frank Harris
Frederic Hemmer
Gael youinou
Gaetano Maron
Gavin McCance
Gergely Sipos
Giorgio Maggi
Giorgio Pauletto
giovanna stancanelli
Giuliano Pelfer
Giuliano Taffoni
Giuseppe Andronico
Giuseppe Codispoti
Hannah Cumming
Hannelore Hammerle
Hans Gankema
Harald Kornmayer
Horst Schwichtenberg
Huard Helene
Hugues BENOIT-CATTIN
Hurng-Chun LEE
Ian Bird
Ignacio Blanquer
Ilyin Slava
Iosif Legrand
Isabel Campos Plasencia
Isabelle Magnin
Jacq Florence
Jakub Moscicki
Jan Kmunicek
Jan Svec
Jaouher KERROU
Jean Salzemann
Jean-Pierre Prost
Jeremy Coles
Jiri Kosina
Joachim Biercamp
Johan Montagnat
John Walk
John White
Jose Antonio Coarasa Perez
José Luis Vazquez
Juha Herrala
Julia Andreeva
Kerstin Ronneberger
Kiril Boyanov
Kiril Boyanov
Konstantin Skaburskas
Ladislav Hluchy
Laura Cristiana Voicu
Laura Perini
Leonardo Arteconi
Livia Torterolo
Losilla Guillermo Anadon
Luciano Milanesi
Ludek Matyska
Lukasz Skital
Luke Dickens
Malcolm Atkinson
Marc Rodriguez Espadamala
Marc-Elian Bégin
Marcel Kunze
Marcin Plociennik
Marco Cecchi
Mariusz Sterzel
Marko Krznaric
Markus Schulz
Martin Antony Walker
Massimo Lamanna
Massimo Marino
Miguel Cárdenas Montes
Mike Mineter
Mikhail Zhizhin
Mircea Nicolae Tugulea
Monique Petitdidier
Muriel Gougerot
Nadezda Fialko
Nadine Neyroud
Nick Brook
Nicolas Jacq
Nicolas Ray
Nils Buss
Nuno Santos
Osvaldo Gervasi
Othmane Bouhali
Owen Appleton
Pablo Saiz
Panagiotis Louridas
Pasquale Pagano
Patricia Mendez Lorenzo
Pawel Wolniewicz
Pedro Andrade
Peter Kacsuk
Peter Praxmarer
Philippa Strange
Philippe Renard
Pier Giovanni Pelfer
Pietro Lio
Pietro Liò
Rafael Leiva
Remi Mollon
Ricardo Brito da Rocha
Riccardo di Meo
Robert Cohen
Roberta Faggian Marque
Roberto Barbera
Roberto Santinelli
Rolandas Naujikas
Rolf Kubli
Rolf Rumler
Romier Genevieve
Rosanna Catania
Sabine ELLES
Sandor Suhai
Sergio Andreozzi
Sergio Fantinel
Shkelzen RUGOVAC
Silvano Paoli
Simon Lin
Simone Campana
Soha Maad
Stefano Beco
Stefano Cozzini
Stella Shen
Stephan Kindermann
Steve Fisher
tao-sheng CHEN
Texier Romain
Toan Nguyen
Todor Gurov
Tomasz Szepieniec
Tony Calanducci
Torsten Antoni
tristan glatard
Valentin Vidic
Valerio Venturi
Vangelis Floros
Vaso Kotroni
Venicio Duic
Vicente Hernandez
Victor Lakhno
Viet Tran
Vincent Breton
Vincent LEFORT
Vladimir Voznesensky
Wei-Long Ueng
Ying-Ta Wu
Yury Ryabov
Ákos Frohner

Wednesday, 1 March
- 13:00 → 14:00
  
  Lunch 1h
- 14:00 → 18:30
  1c: Earth Observation - Archaeology - Digital Library 40-SS-D01
  
  40-SS-D01
  
  CERN
  - 14:00
    
    Introduction to the parallel session 15m
  - 14:15
    
    Diligent and OpenDLib: long and short term exploitation of a gLite Grid Infrastructure 15m
    
    The demand for Digital Libraries has recently grown considerably, DLs are perceived as a necessary instrument to support communication and collaboration among the members of communities of interest; many application domains require DL services, e.g. e-Health, e-Learning, e- Government, and many of the organizations that demand a DL are small, distributed, and dynamic, because they use the DL to support temporary activities such as courses, exhibitions, projects, etc. Nowadays the construction and management of a DL requires high investments and specialized personnel because the content production is very expensive and multimedia handling requires high computational resources. The effect are that years are spent in designing and setting up a DL and that the DL systems lack interoperability and the services provided are difficult to reuse. This development model is not suitable to satisfy the demand of many organizations, so the purpose of DILIGENT is to create a Digital Library Infrastructure that will allow members of dynamic virtual research organizations to create on-demand transient digital libraries based on shared computing, storage, multimedia, multi- type content, and application resources. Following this vision Digital libraries are not ends in themselves; rather they are enabling technologies for digital asset management, electronic commerce, electronic publishing, teaching and learning, and other activities. DILIGENT is a is a three-year European funded project that aims at developing a test-bed DL infrastructure able to create a multitude of DLs on-demand, manage the resources of a DL (possibly provided by multiple organizations), and operate the DL during its lifetime. These DLs created by DILIGENT will be active on the same set of shared resources: content sources (i.e. repositories of information searchable and accessible), services (i.e. software tools, that implement a specific functionality and whose descriptions, interfaces and bindings are defined and publicly available) and hosting nodes (i.e. networked entities that offer computing and storage capabilities and supply an environment for hosting content sources and services). By exploiting appropriate mechanisms provided by the DL infrastructure, producer organizations register their resources and provide a description of them. The infrastructure manages the registered resources by supporting their discovering, reservation, monitoring and by implementing a number of functionalities that aim at supporting the required controlled sharing and quality of service. The composition of a DL is dynamic since the services of the infrastructure continuously monitor the status of the DL resources and, if necessary, change the components of the DL in order to offer the best quality of service. By relying on the shared resources many DLs, serving different communities, can be created and modified on-the-fly, without big investments and changes in the organizations that set them up. The DILIGENT infrastructure is being constructed by implementing a service oriented architecture in a Grid framework. The DILIGENT design will be service oriented in order to provide as many reusable components as possible for other e-applications that could be created on top of the basic DILIGENT infrastructure. Furthermore, DILIGENT exploits the Grid middleware, gLite, and the Grid production infrastructure released by the Enabling Grid for E-Science in Europe (EGEE) project. By merging a service-oriented approach with a Grid technology we can exploit the advantages of both. In particular, the Grid provides a framework where a good control of the shared resources is possible. By taking full advantage of the scalable, secure, and reliable Grid infrastructure each DL service will provide an enhanced functionality with respect the equivalent non-Grid-aware service. Moreover, the gLite Grid enables the execution of very computational demanding applications, such as those required to process multimedia content. DILIGENT will enhance existing Grid services with the functionality needed to support the complex services interactions required to build, operate and maintain transient virtual digital libraries. In order to support the services of the DILIGENT framework and the user community expectations some key Grid services are needed: the Grid infrastructure should support a cost-effective DL operational model based on transient, flexible, coordinated “sharing of resources”, address the main DL architecture requirements (distribution, openness, interoperability, scalability, controlled sharing, availability, security, quality), provide a basic common infrastructure for serving several different application domains and offer high storage and computing capabilities that enable the provision of powerful functionality on multimedia content e.g. images and videos. From the conceptual point of view the services that implement the DILIGENT infrastructure are organized in a layered architecture. The top layer, i.e. the Presentation layer, is user-oriented. It supports the automatic generation of user-community specific portals, providing personalized access to the DLs. The Workflows layer contains services that make it possible to design and verify the specification of workflows, as well as services ensuring their reliable execution and optimization. Thanks to these set of services it is possible to expand the infrastructure with new and complex services capable to satisfy unpredicted user needs. The DL Components layer contains the services that provide the DL functionalities. Key functionalities provided by this area are: management of metadata; automatically translation for achieving metadata interoperability among disparate and heterogeneous content sources; content security through encryption and watermarking; archive distribution and virtualization; distributed search, access, and discovery; annotation; cooperative work through distributed workspace management. The services of the lower architectural layer, the Collective Layer, jointly with those provided by the gLite Grid middleware released by the EGEE project, manage the resources and applications needed to run DLs. The set of resources and the sharing rules are complex since multiple transient DLs are created on-demand and are activated simultaneously on these resources. Following the first tests performed on the first releases of the gLite middleware the following Grid requirements were identified: it should be possible to query for the maximum number of CPUs concurrently available in order to allow to a DILIGENT high level service to automatically prepare a DAG where each node will be entitled to process a partition of the data collection, to use parametric jobs/automatic partitioning on data, to support service certificate for a high level service, to specify a job specific priority, to specify a priority for a user or for a service, to ask for on-disk encryption of data, to dynamically manage VO creation and to dynamically support user/service affiliation to a VO. DILIGENT will be demonstrated and validated by two complementary real-life application scenarios: one from the culture heritage domain, one from the environmental e-Science domain. The former is an interesting challenge thanks to the multidisciplinary collaborative research, the image based retrieval, the semantic analysis of images, and the support for research and teaching. The latter obliges DILIGENT to manage a wide variety of content types (maps, satellite images, etc.) with very large, dynamic data sets in order to support community events, report generation, disaster recovery. The DILIGENT project collaborates with EGEE mainly through technical interactions (technical meetings (mainly with JRA1), gLite mailing lists subscription, tutorial) and feedback on EGEE activities and on DILIGENT project (gLite bugs submission and grid related DL requirements). Now DILIGENT has two independent infrastructures (gLite v1.4): a Development Infrastructure (DDI) and a Testing infrastructure (DTI). These infrastructures are geographically distributed, linking 6 sites in Athens, Budapest, Darmstadt, Pisa, Innsbruck and Rome. We are running gLite experimentation tests on these infrastructures since July 2005 and we collected some useful data about data and job management. As first approach to exploit the gLite Grid storing and processing on demand capabilities, we developed two experimental brokers that, starting from an existing digital library management system, named OpenDLib, allow interfacing the DDI. The gLite SE broker provides OpenDLib services with the pool of SEs available via the gLite software. Moreover, it optimizes the usage of the available SEs. In particular, this service interfaces the gLite I/O server to perform the storage (put) and withdrawal (rm) of files and the access to them (get). In designing this service one of our main goals was to provide a workaround to two main problems, i.e. inconsistence between catalog and storage resource management systems, and failure without notification in the access or remove operations. Although the gLite SE broker could not improve the reliability of the requested operations we designed it in such a way to: (i) monitor its requests, (ii) verify the status of the resources after the processing of the operations, (iii) repeat the registration in the catalog and/or storage of the file until it is considered correct or unrecoverable, (iv) return a valid message reporting the exit status of the operation. The gLite WMS wrapper provides to the other OpenDLib services with the computing power supplied by gLite CEs. Actually, the goal of this service is to provide an higher level interface than those provided by the gLite components for managing jobs, i.e. applications that can run on CEs, and DAGs, i.e. direct acyclic graphs of dependent jobs. The gLite WMS broker has therefore been designed to: (i) deal with more than one WMS, (ii) monitor the quality of service provided by these WMSs by analyzing the number of managed jobs and the average time of their execution, and, finally, (iii) monitor the status of each submitted job querying the Logging and Bookkeeping (LB) service.
    
    Speaker: Dr Davide Bernardini (CNR-ISTI)
    
    Slides
  - 14:30
    
    Data Grid Services for National Digital Archives Program in Taiwan 15m
    
    Digital archives/libraries are widely recognized as a crucial component of the global information infrastructure for the new century. Research and development projects in many parts of the world are concerned about using advanced information technologies for managing and manipulating digital information, ranging from data storage, preservation, indexing, searching, presentation, and dissemination capabilities to organizing and sharing of information over networks. Digital Archive demands for reliable storage systems for persistent digital objects, well-organized information structure for effective content management, efficient and accurate information retrieval mechanism and flexible services for varying users needs. Hundreds of Petabyte of digital information has been created and dispersed all over the internet since computers had been used for information processing, and the amount still grows in the rate of tens of Petabyte per year. Grid technology offers a possible solution for aggregating and processing diversified heterogeneous Petabyte scale digital archives. Metadata-based information representation makes specific and relative information retrieval more accurately, makes information resources interoperable, and paves the way for formal knowledge discovery. Taking advantage of advancing IT, semantic level information indexing, categorizing, analyzing, tracking, retrieving and correlating could be implemented. Data Grid aims to set up a computational and data-intensive grid of resources for data analysis. It requires coordinated resource sharing, collaborative processing and analyzing on huge amounts of data produced and stored by many institutions. In Taiwan, a National Digital Archive Project (NDAP) was initiated in 2002 with its pilot phase started in 2001. According to the record in 2005, more than 60 Terabytes digital objects was generated and archived by 9 major content holders in Taiwan. Not only delicate and gracious Chinese cultural assets can be preserved and made available via the Internet, but this approach could be proposed as a new paradigm of academic researches based on digital and integrated information resources. The design and implementation phase is ongoing and we would like to illustrate in the EGEE User Forum. Academia SINICA Grid Computing Centre (ASGC) is in charge of building a new generation of Grid-based research infrastructure in Academia SINICA and in Taiwan based on EGEE and OSG as the Grid middleware. This infrastructure is a major component for the development and the deployment of the National Digital Archive Project (NDAP) providing long-term preservation of the digital contents and unified data access. These services will be built upon the e-Science infrastructure of Taiwan. The Storage Resource Broker (SRB) developed at SDSC, is a Middleware which enables scientists to create, manage and collaborate with flexible, unified "virtual data collections" that may be stored on heterogeneous data resources distributed across a network. The SRB system is the first and the largest (in terms of the data volume) data store in Academia SINICA right now. The system was deployed by ASGC in early 2004, which consists of 7 sites in different institutes, linked by a dedicated fibre campus network, and provided 60 TB capacities in total. In early 2006, it will expand to 120 TB. As of January 2006, more than 30 TB and 1.4 million files have been archived in the distributed mass storage environment. All files are also preserved in two copies on different sites. In this presentation, idea for utilizing Data Grid infrastructure for NDAP will be depicted and discussed. We will describe the use of SRB in building a collaborative environment for Data Grid Services of NDAP. In the environment, many data intensive applications are developed. We also describe our integration experience in building applications of NDAP. For each application we characterize the essential data virtualization services provided by the SRB for distributed data management.
    
    Speaker: Mr Eric Yen (Academia SINICA Grid Computing Centre, Taiwan)
    
    Slides
  - 14:45
    
    Discussion 15m
  - 15:00
    
    Project gridification: the UNOSAT experience 15m
    
    The EGEE infrastructure is a key part of the computing environment for the simulation, processing and analysis of the data of the Large Hadron Collider (LHC) experiments (ALICE, ATLAS, CMS and LHCb). The example of the LHC experiments illustrates well the motivation behind Grid technology. The LHC accelerator will start operation in 2007, and the total data volume per experiment is estimated to be a few PB/year at the beginning of the machine’s operations, leading to a total yearly production of several hundred PB for all four experiments around 2012. The processing of this data will require large computational, storage and associated human resources for operation and support. It was not considered feasible to fund all of the resources at one site, and so it was agreed that the LCG computing service would be implemented as a geographically distributed Computational Data Grid. This means, the service will use computational and storage resources, installed at a large number of computing sites in many different countries, interconnected by fast networks. At the moment, the EGEE infrastructure counts 160 sites, distributed over more than 30 countries. These sites hold 15000 CPUs and about 9PB of storage capability. The Grid middleware will hide much of the complexity of this environment from the user, organizing all the resources in a coherent virtual computer centre. The computational and storage capability of the Grid is attracting other research communities and we would like to discuss the general patterns observed in supporting new applications, porting their application onto the EGEE infrastructure. In this talk we present our experiences in the porting of three different applications inside the Grid like Geant4, UNOSAT and others. Geant4 is a toolkit for the Monte Carlo simulation of the interaction of particles with matter. It is applied to a wide field of research including high energy physics and nuclear experiments, medical, accelerator and space physics studies. ATLAS, CMS, LHCb, Babar, and HARP are actively using Geant4 in production. UNOSAT is a United Nations initiative to provide the humanitarian community with access to satellite imaginary and Geographic System services. UNOSAT is implemented by the UN Institute for Training and Research (UNITAR) and manager by the UN Office for Project Services (UNOPS). In addition, partners from public and private organizations constitute the UNOSAT consortium. Among these partners, CERN participates actively providing the computational and storage resources needed for their images analysis. During the gridification of the UNOSAT project, the collaboration with the developers of the ARDA group to adapt the AMGA software to the UNOSAT expectations was extremely important. The satellite images provided by UNOSAT have been stored in Storage Systems at CERN and registered inside the LCG Catalog (LFC). The files so registered have been identified with an easy to remember Logical File Name (LFN). The LFC Catalog is then able to map these LFN to the physical location of the files. Due to the UNOSAT infrastructure, their users will provide as input information the coordinates of each image. AMGA is able to map these coordinates (considered metadata information) to the corresponding LFN of the files registered inside the Grid. Then the LFC will find the physical location of the images. A successful model to guarantee a smooth and efficient entrance in the Grid environment is to identify an expert support to work with the new community. This person will assist them during the implementation and execution of their applications inside the Grid. He will also be the Virtual Organization (VO) contact person with the EGEE sites. This person will work together with the EGEE deployment team and with the responsible of the sites to set the services needed by the experiment or community, observing also the relevant security and access policies. Once these new communities attain a good level of maturity and confidence, a VO Manager would be identified in the users community. This talk will report a number of concrete examples and it will try to summarize the main lessons. We believe that this should be extremely interesting for new communities in order to early identify possible problems and prepare the appropriate solutions. In addition, this support scheme would also be very interesting as a model, for example, for local application support in EGEE II.
    
    Speaker: Dr Patricia Mendez Lorenzo (CERN IT/PSS)
    
    Slides
  - 15:15
    
    International Telecommunication Union Regional Radio Conference and the EGEE grid 15m
    
    The Radiocommunication Bureau of the ITU (ITU-BR) manages the preparations for the ITU Regional Radio Conference RRC06 to establish a new frequency plan for the introduction of digital broadcasting (band III and IV/V) in Europe, Africa, Arab States and former-USSR States. During the 5 weeks of the RRC06 Conference (15 May to 16 June 2006) delegations from 119 Member States will negotiate the frequency plan. The frequency plan will be established in an iterative way. During week time at the RRC06 administrations will negotiate and submit their requirements to the ITU-BR, which will conduct over the subsequent weekend all the calculations (analysis and synthesis) that would result in assigning specific frequencies for the draft plan. The output of the calculations will be the input for negotiations in the subsequent week, with the last iteration constituting the basis for the final frequency plan. In addition, partial calculations are envisaged for parts of the planning area in between two global iterations (for the entire planning area). For obtaining optimum planning of the available frequency spectrum, two different software processes have been developed by the European Broadcasting Union and they are run in sequence: compatibility assessment and plan synthesis. The compatibility assessment (which is very CPU demanding and can be run on a distributed infrastructure) calculates the interference between digital requirements, analogue broadcasting and other services stations. The plan synthesis assigns channels to requirements which could share the same channel. The limited time to perform the calculation calls for the optimization of the process. The turnaround time to provide a new set of results would be a critical factor for the success of the Conference. The EGEE grid will greatly enhance the ITU-BR available resources allowing better serving the Conference. The grid infrastructure will complement the client-server distributed system developed within the ITU-BR, which has been used for the first exercises. In addition, the possibility to perform faster calculations could improve the efficiency of the negotiation (for example, giving preliminary results during the negotiation weeks themselves or allow extra quality checks and compatibility studies). The compatibility assessment consists in running a large number of jobs (some tens of thousands). Each job is basically the same application running on different datasets representing the parameters of radio-stations. One should note that the execution time varies by more than 3 orders of magnitudes (the majority of jobs needs only few seconds but few jobs require many hours) depending on the input parameters and cannot be completely predicted. To cope with this situation we decided to use a client-server system called DIANE that allows run-time load balancing, access to heterogeneous resources (Grid and local cluster at the same time) and a robust infrastructure to cope with run-time problems. In the DIANE terminology, a job is defined as a “task”. DIANE allows using in the most effective way the available resources since each available worker nodes asks for the next task: while a long task will “block” a node, in the mean time the short tasks (the large majority) will flow through the other nodes. We have already demonstrated to be able to perform the required calculations on the EGEE/LCG infrastructure (in the first tests, we have run with a parallelism of the order of 50, observing the expected speed-up factor) and we are preparing, in close collaboration with CERN, to use these techniques during the Conference later this year. The EGEE infrastructure does not only enable us to give the adequate support for an important international event but, in addition, the substantial speed-up already observed opens the possibility to allow faster and more detailed studies during the Conference. The technical improvement gives the possibility to provide a better service and technical data to the Conference’s delegates. The present set up is well suited for the foreseen application. The possibility to access resources from the grid and corporate resources (which we are not yet exploiting) is very appealing and should be interesting for other users. The possibility to describe and execute more complex workflow (presently we are using the system to execute independent tasks in parallel) could increase the interest for the tools we are currently using.
    
    Speaker: Dr Andrea Manara (ITU BR)
    
    Slides
  - 15:30
    
    ArchaeoGRID, a GRID for Archaeology 15m
    
    Modern archaeology, between the historical, anthropological and social sciences, is the more suitable and mature for the application of the Grid technologies. In fact, archaeology is a multidisciplinary historical science, using data and methods from many of the natural and social sciences. Archaeological research do and has done large use of computers and digital technologies for data acquisition and storage, for quantitative and qualitative data analysis, for data visualisation, for mathematical modeling and simulation. The Web also is intensively used for results exchange, for communication and for accessing to large database by the Web Services technology. The interest of archaeologist for such methods is today more than a temporal interest. There are many computational archaeologists through the world and specialised quantitative archaeology laboratories experimenting new methods in spatial analysis, geostatistics, geocomputation, artificial intelligence applications to archaeology, etc. In fact any material remains, artifacts and ecofacts, macro and microscopic, present on the earth surface, representing the material culture of the past societies is relevant for the archaeology, independently from its esthetical or economical value. Remains should be described according to their basic properties (shape, size, texture, composition, spatial and temporal location), which implies the use of sophisticated procedures for its computer representation: 3D geometry and realistic rendering, among them. Furthermore, data should be related spatially and temporally in complex ways. In so doing, an archaeological site should be understood as a complex sequence of finite states of a spatio-temporal trajectory, where an original entity (ground surface) is modified successively, by accumulating things on it, by deforming a previous accumulation or by direct physical modification (building, excavation). This spatio- temporal representation must be considered as continuum made up of discrete, irregular, discontinuous geometrical shapes (surfaces, volumes) defined by additional characteristics (shape, texture, composition, as dependent variables of the model) which in turn influence the variation of every archaeological feature. The idea is that interfacial boundaries represent successive phases, and are dynamically constructed. Within them, there should be some statistical relationship between the difference in value of the dependent regionalised variable which defines the discontinuity at any pair of points and their distance apart. The complexities of archaeological data processing are more demanding when we consider that archaeological analysis cannot be constrained to the study of a single site. In recent years archaeological research teams are very much interested in doing extended projects involving the study of many different sites at very large geographic regions during very long time spans. This work is specially relevant in the case of the study of paleoclimatic human adaptations, hunter-gatherer societies mobility and the study of the origins of cities and early state formation. In these cases, archaeological data produced by excavation and field survey or retrieved from different types of available archives, are not only huge in quantity but also in diversity and complexity, and the computing power needed for their analysis, simulation and visualisation is very large. The purpose is then working towards a landscape archaeology which should reconstruct the evolution of settlement organization on the studied region with a low or high spatio-temporal resolution in relation with the analysed level, intersite, intrasite or regional. For such a precise reconstruction of geomorphology, hydrology, climate, landcover and landuse of the region, based on known data, must be done using models and simulation. Moreover, as a social and historical science, such a simulation cannot stops at the physical elements, but it should include the study of demographic variation, including demographic models, settlement and urban dynamics and production and exchange models. All that means that archaeology is a computer intensive discipline. Model building is time consuming and resource intensive, and archaeological data are huge. They also are unique in character, so they cannot be substituted, because they need care to preserve. Everything in our analysis has to be preserved and stored, but also the information about them. The results of simulated data must be preserved for a long time because they represent the status of the data interpretation at some date and will be useful for future analysis.("Crisis of Curation"). For the previous reasons the archaeology need to exploit the GRID technology for data access, storage and management, for data analysis, for simulation, for archaeological knowledge circulation : from WEB to GRID. ArchaeoGRID will offer the unique opportunity to share data, processing and model building opportunities with other branches of science and create synergy with other GRID projects.( Earth Sciences, Digital Library, Astrophysics GRID projects, etc. ) The starting project proposes to begin with the study of the origin of the city in Mediterranean area between XI and VIII Centuries B.C. using the GILDA t- Infrastructure. The study will provide a functional framework for broad studies of the interactions of humans in ancient urban societies and with the environment . During the past fifteen years, archaeologists in the Mediterranean have accumulated large amounts of computerized data that have remained trapped in localized and often proprietary databases. It is now possible to change that situation. ArchaeoGRID will be made to facilitate ways in which such data might be brought together and shared between researchers, students, and the general public. Archaeological data always includes an intrinsic geographic component, and the compilation and sharing of geographic data through GIS has become increasingly important in the governmental, private sector and academic worlds during the past years. New GRID technologies for spatial data, expansion of the Web Services and development of open GIS technology now make it possible to share geographic information quickly, widely and effectively. The first application running on the GILDA be will be related with paleoclimate and weather simulation in the regions where the urban centers originate around the IX and VIII centuries B.C. In fact weather phenomena, climate and climate changes produced effects on individuals and societies in the past. In the next future, GILDA will be used to explore the possibilities of different computational methodologies insiting of the tools for the analysis of spatio-temporal data. Classical statistical analysis of spatio-temporal series will be used, but also we intend to develop new methods for the analysis of longitudinal analysis, based on neural networks technology. Simulation programs and data available on the web and free will be used for application. Such data could be integrated with data from archaeological excavation and survey. The complexity and the dimension of program code and data require the use of MPI library for parallel calculation on GILDA computers using Linux OS. Open source GRASS GIS and package R for statistical analysis installed on GILDA will give the possibility to prepare the input data for the full Mediterranean area and for the territories of the urban centers. A schematic architecture of the ArchaeoGRID showing the relevant parts and their links will be presented. Given the intrinsic nature of archaeological field work, the communication and the information exchange between groups on site and groups working in distant laboratories, museums and universities need fast and efficient communication ways. Telearchaeology lies at the real nature of archaeological endeavor and could be very useful also for education and for diffusion of the archaeological knowledge. A multicast architecture for advanced videoconferencing specially tailored for large scale persistent collaboration could be used. The added value, linked with new perspectives of the archaeological and historical research, with the management of the archaeological heritage, with the media production, with the territory management and with tourism, will be discussed.
    
    Speaker: Prof. Pier Giovanni Pelfer (Dept. Physics, University of Florence and INFN, Italy)
    
    Slides
  - 15:45
    
    Discussion 15m
  - 16:00
    
    Coffee break 30m
  - 16:30
    
    Worldwide ozone distribution by using Grid infrastructure 15m
    
    ESRIN : L. Fusco, J. Linford, C. Retscher IPSL : C. Boonne, S. Godin-Beekmann, M. Petitdidier, D. Weissenbach KNMI: W. Som de Cerff SCAI-FHG: J. Kraus, H. Schwichtenberg UTV : F. Del Frate, M. Iapaolo Satellite data processing presents a challenge for any computer resources due to the large volume of data and number of files. The vast amount of data sets and databases are all distributed among different countries and organizations. The investigation of such data is limited to some sub-sets. As a matter of fact, all those data cannot be explored completely due on one hand to the limitation in local computer and storage power, and on the other hand to the lack of tools adapted to handle, control and analyse efficiently so large sets of data. In order to check the capability of a Grid infrastructure to fill those requirements, an application based on ozone measurements was designed to be ported first on DataGrid, then on EGEE and local Grid in ESRIN. The satellite data are provided by the experiment, GOME aboard the satellite ERS. From the ozone vertical total content, ozone profiles have been retrieved by using two different algorithm schemas, one is based on an inversion protocol (KNMI), the other on a neural network approach (UTV). The porting on DataGrid was successful however some functionalities are missing to make the application operational. In EGEE, the reliability of the infrastructure has been as reliable as a local Grid. The second part of the application has been the validation of those satellite ozone profiles by profiles measured by ground-based lidars. The goal was to find out collocated observations meta databases were built to solve this problem. The result has been the production of the 7 years of data on EGEE and on local Grid at ESRIN with two versions of the Neural Network algorithm and several months by the inversion algorithm. It is an amount of around 100 000 files registered on EGEE. Then, the validation of this set of data was carried out by using all the lidar profiles available in the NDSC databases (Network Detection of Stratospheric Changes). To find collocation data an OGSA-DAI metadata server has been implemented and geospatial queries permit to search the orbit passing over the lidar site. The second work, started during DataGrid, has been the development of a portal, specific to the Ozone application, described above, and extended latter to other satellite data like Meris…The role of this portal is to provide an operational way to a friendly end-use of Grid infrastructure. It provides the missing functionalities of the Grid infrastructure. EGEE offers the possibility to store all the ozone data obtained by satellite experiment (GOME, GOMOS, MIPAS…) as well as ground-based network of lidars and radiosoundings… The next goal on the way is to be able to find out at a given location and/or at a given time the distribution of ozone by combining all the existing databases. In this presentation, the scientific and operational interest will be pointed out.
    
    Speaker: Monique Petitdidier (IPSL)
    
    Slides
    
    Video
  - 16:45
    
    On-line demonstration of Flood application at EGEE User Forum 15m
    
    The flood application has been successfully demonstrated at EGEE second review in December and we would demonstrate it at EGEE User forum for Grid application developers and Grid users. Flood application consists of several numerical models of meteorology, hydrology and hydraulics. A portal is developed for comfortable use of flood application. The portal has four main modules: • Workflow management module: for managing execution of tasks with data dependences • Data management module: allows users to search and download data from storage elements • Visualization module: show the output from models in several forms: text, picture, animation and virtual reality • Collaboration module: allows users to communicate with each other and cooperate on flood forecasting The demonstration will be done on GILDA demonstration testbed. Job execution in the Grid tested will be performed using gLite middleware. The aim of the demonstration is to show how to implement complicate grid applications with many models and support modules and also the FloodGrid portal, that allows users to run the application without knowledge about grid computing
    
    Speaker: Dr Viet Tran (Institute of Informatics, Slovakia)
    
    Slides
  - 17:00
    
    Solid Earth Physics on EGEE 15m
    
    This abstract describes the "Solid Earth Physics" applications of the ESR(Earth Science Research) VO. These applications, developed or ported by the "Institut de Physique du Globe de Paris" (IPGP) address mainly seismology, data processing as well as simulation. Solid Earth Physics deployed successfully two applications on EGEE. The first one allows the rapid determination of earthquake mechanisms, and the second one, SPECFEM3D, allows numerical simulation of earthquakes in complex three-dimensional geological models. A third application, currently being ported, will allow gravity gradiometry studies from GOCE satellite data. 1) Rapid determination of Earthquake centroid moment tensor (E. Clévédé, IPGP) The goal of this application is to provide first order informations on seismic source for large Earthquakes occurring worldwide. These informations are: the centroid, which corresponds to the location of the space-time barycenter of the rupture; the first moments of the rupture in the point-source approximation, which are the scalar moment giving the seismic energy released (from which the moment magnitude is deduced), the source duration, and the moment tensor that describes the global mechanism of the source (from which is deduced the orientation of the rupture plane and the kind of displacement on this plane). The data used are three-components long-period seismic signals (from 1 to 10 MHz) recorded worldwide. In the case of a 'rapid' determination we use data from the GEOSCOPE network that allows us to obtain records from a dozen of stations within a few hours after the occurrence of the event. In order to deal with the trade-off between centroid and moment tensor determinations, the centroid and the source duration are estimated by an exploration over a space-time grid (longitude, latitude, depth and source duration). When the centroid is supposed to be known and fixed, the relation between the moment tensor and the data is linear. Then, for each point of the centroid parameter space, we compute Green functions (one for each of the 6 elements of the moment tensor) for each receiver, and proceed to linear inversions in the spectral domain, for each different source durations. The best solution is determined by the data fit. This application is well adapted to the EGEE grid, as each point of the centroid parameter space can be treated independently, the main part of the time computation being the Green functions computation. For a single point, a run is performed in a few minutes. In a typical case, an exploration grid (longitude, latitude, depth and source duration) of 10x10x10x10 requires about 100h of time computation, which is reduced to about 1 hour over a hundred different jobs submitted to the EGEE grid. The new features for workflow provided by gLite should allow the simplification of the management of the different steps of a run. 2) SPECFEM3D: Numerical simulation of earthquakes in complex three-dimensional geological models (D. Komatitsch MIGP; G. Moguilny, IPGP) The spectral-element method (SEM) for regional scale seismic wave propagation problems is used to model wave propagation at high frequencies and for complex geological structures. Simulations based upon a detailed sedimentary basin model and this accurate numerical technique produce generally nice waveform fits between the data and 3-D synthetic seismograms. Moreover, remaining discrepancies between the data and synthetic seismograms could ultimately be utilized to improve the velocity model based upon a structural inversion, or the source parameters based upon a centroid moment-tensor (CMT) inversion. This application, written in Fortran 90 and using MPI, is very scalable and already ran outside EGEE on 1994 processors in the Japanese Earth Simulator, and inside EGEE on 64 processors at Nikhef (NL). The amount of disk space and memory depend on the input parameters but are never very large. However, this application has some technical constraints : the I/O have to be done in local files (on each node) and on shared files (seen by all nodes), and the script must be able to submit 2 executable files sequentially, which use the same nodes in the same order. This is because the SPECFEM3D software package consists of two different codes, a mesher and a solver, which work on the same data. Some successful tests have been done with gLite but the problem of differentiate a node (with several CPUs) and a CPU when requiring the resources, doesn't seem to be solved. It also will be interesting to have access to "fast clusters" (with high throughput and low latency networks, as Myrinet, SCI...), and, to access larger configurations, by having the possibility to access various sites during a given run. 3) Gravity gradiometry (G. Pajot, IPGP) The GOCE satellite (see [1]) is to be launched by the European Space Agency by the end of this year. Onboard is an instrument, called a gradiometer, which measures the spatial derivatives of the gravity field in three independent directions of space. Although gravity gradiometry was born more than a century ago and successfully used for geophysical prospecting, GOCE satellite will provide the first set of gravity gradiometry data on the whole Earth with unprecedented spatial resolution and accuracy and specific methods have to be developed. Thanks to these data, we will be able to derive information about the Earth inner mass distribution patterns at various scales (from the sedimentary basin to the Earth's Mantle). To this aim, we develop a pseudo Monte Carlo inversion method (see [2]) to interpret GOCE data. One step of it is the model generation, which is the limiting factor of it. A model is a possible density distribution, to which correspond calculated gravity gradients as they would be measured by the instrument. These calculated gradients are compared to those actually measured; the nearer they are from measured ones, the closer the model is from real Earth. One rough pseudo random model takes about 5 minutes to be generated on a 2.8 GHz CPU, finest ones generation reaches 20 minutes and a set of 1000 models is a good basis to start the model space exploration, each one being independent from the others. Thus, EGEE is the perfect frame to develop such an application. We test and validate our algorithm using a set of marine gradiometry measurements provided by the Bell Geospace Company. These data need a frequent restricted access. First results of the application and solutions to the confidentiality problem are exposed here. References: [1] http://ganymede.ipgp.jussieu.fr/frog/ [2] Geophysical Inversion with a Neighbourhood Algorithm -I. Searching a parameter space,* Sambridge, M., *Geophys. J. Int., **138 *, 479-494, 1999. In conclusion, the main goal of these three applications is to create a Grid-based infrastructure to process, validate and exchange large sets of data within the worldwide Solid Earth physics community as well as to provide facilities for distributed computing. The stability of the infrastructure and the easiness to use the Grid are prerequisites to reach these objectives and bring the community to use the Grid facilities.
    
    Speaker: Geneviève Moguilny (Institut de Physique du Globe de Paris)
    
    Slides
  - 17:15
    
    Discussion 15m
  - 17:30
    
    Expandig GEOsciences on DEmand 15m
    
    Worldwide population faces difficult challenges for the coming years to produce enough energy to sustain global growth and predict main evolutions of the Earth such as earthquakes. Seismic data processing and reservoir simulation are key technologies to help researchers in geosciences to tackle these challenges. Modern seismic data processing and geophysical simulations require greater amounts of computing power, data storage and sophisticated software. The research community hardly keeps pace with this evolution, resulting in difficulties for small or medium research centres to exploit their innovative algorithms. Grid Computing is an opportunity to foster sharing of computer resources and give access to large computing power for a limited period of time at an affordable cost, as well as sharing data and sophisticated software. The capability to solve new complex problems and validate innovative algorithms on real scale problems is also a way to attract and keep the brightest researchers for the benefit of both the academic and industrial R&D geosciences communities. Under the “umbrella” of the EGEE Infrastructure project was created EGEODE, “Expanding Geosciences On Demand” Open Virtual Organization. EGEODE is dedicated to research in geosciences for both public and private industrial research & development and academic laboratories. The Geocluster software, which includes several tools for signal processing, simulation and inversion, enables researchers to process seismic data and to explore the composition of the Earth's layers. In addition to Geocluster, which is used only for R&D, CGG (http://www.cgg.com ) develops, markets and supports a broad range of geosciences software systems covering seismic data acquisition and processing, as well as geosciences interpretation and data management. Many typical Grid Computing projects aim pure Research domains in infrastructure, middleware and usage such as High Energy Physics, Bio informatics, Earth Observation. EGEODE moves the focus towards collaboration between Industry and Academia. There are two main potential impacts: 1 - The transfer of know-how and services to industry. 2 - The consolidation and extension of EGEODE community, which includes both industrial and academic research centres. The general benefits of grid computing are: - Access to computing resources without investing in large IT infrastructure. - Optimise IT infrastructure o Load balancing between Processing Centres o Smoothing peaks of production o Service continuity; Business Continuity Plan o Better fault tolerant system and applications o Leverage Processing Centres capacity - Lower the total cost of IT by sharing available resources with other members of the community. And the specific benefits for the Research community: - Easy access to academic software and comprehensive, industrial software. - Free the researcher from the additional burden of managing IT hardware and software complexity and limitations. - Create a framework to share data and project resources with other teams across Europe and worldwide. - Share best practices, support, and expertises. - Enable cross-organizational teamwork and partnership. Some of these benefits have been demonstrated through other Grid Projects and need to be validated in our Geosciences community. Sharing IT resources and Data is typically the primary goal of a Grid Project. Early indicators in our V.O. show that facilitating access to software and simplifying management of hardware and software complexity are also extremely important.
    
    Speaker: Mr Gael Youinou (Unknown)
    
    Slides
  - 17:45
    
    Requirements of Climate applications on Grid infrastructures; C3-Grid and EGEE 15m
    
    Human made climate change and its impact on the natural and socio-economic environment is one of todays most challenging problems of mankind. To understand and project processes, changes and impacts of the natural and socio-economic system a growing community of researchers from various disciplines investigates and analyses the earthsystem by means of computer simulation and analysis models. These models are usually computational demanding and data intensive as they need to compute and store high resolved 4-dimensional fields of various parameters. Moreover, the required close collaboration in interdisciplinary and often also international research projects involves intensive community interactions. To support climate workflows the community established proprietary, mostly national or regional solutions, which are normally grouped around centralized high performance computing and storage resources. Homogeneous discovery of and access to climate data sets residing in distributed petabyte climate archives as well as distributed processing and efficient exchange of climate data are the central components of future international climate research. Thus, the EGEE infrastructure potentially offers a highly suitable environment for such applications. However, existing grid infrastructures - including EGEE - do not yet meet the requirements of the climate community essential for prevalent workflows. Hence, to port existing applications and workflows on the EGEE infrastructure, a stepwise extension of the infrastructure to community specific services is needed. Moreover, the identification and demonstration of feasibility and added value is essential to convince the community to change their established habits. The Collaborative Climate Community Data and Processsing Grid (C3-Grid [1]) is an application driven approach towards the deployment of GRID techniques for climate data analysis. Solutions currently developed in this project offer a potentially fruitful basis to improve the suitability of the EGEE infrastructure as a platform for data analysis within climate research. Within EGEE climate is part of the Earth Science Research (ESR) VO. We evaluated and tested the use of the EGEE infrastructure for climate applications [4]. As part of this prototypes of simulation as well as analysis software were tested on the EGEE infrastructure. We identified 3 different accesspoints for pilot applications, that can demonstrate the potential benefit of the EGEE infrastructure for climate research: Ensemble simulations with models of intermediate complexity, coupling experiments on a common platform and data sharing and analysis. Ensembles of simulations performed with the same model but different future scenarios and different parameterisations are required to quantify the uncertainty and possible variety of future climate predictions. EGEE offers a good infrastructure for such ensemble simulations with models of intermediate complexity, which do not need the performance of a supercomputer. Ensembles can be submitted as DAG, parametric or collection job and results could be directly stored, analysed and reduced to the required information on the grid. The coupling of diverse models of different disciplines is essential to understand the interaction and feedback between the different climate and earth system components, as e.g. the human impact on future climate development. In corresponding projects partners from different institutes of different nations are collaborating on a common modeling framework. The EGEE infrastructure would be a valuable platform for such coupling approaches. Data, models and output could be easily shared, different access and user rights can be established via VOMS. Currently different coupling tools are explored to assess their "grid-suitability". Data sharing and analysis is a central aspect in climate research. The enormous amounts of data, produced by the model simulations need to be analysed, visualised and validated against observations or other data sources to be correctly interpreted. This involves a multiplicity of statistical calculations carried out on samples of different large data files. Currently such data analysis is centred around the heterogeneous database systems, which are accessed via non-standardised metadata. Thus, the establishment of a common data exchange and management infrastructure bridging the existing heterogeneous community datamanagement solutions with the EGEE data management system would add great value to such applications. Especially for the realisation of climate data sharing and analysis workflows on EGEE the following components need to be developed: 1) a common agreed upon metadata schema for discovery of climate data sets stored in grid file space as well as in external community datacenters 2) a common community metadata catalogue based on this schema 3) common interfaces to reference and access grid external data resources (mainly databases) All of these aspects are addressed within the recently introduced national German C3Grid [1] project within the German e-science (D-Grid [2]) initiative which aims to develop a grid middleware specific for the needs of the climate research community. Within this project a common metadata schema is defined. A community metadata catalogue and information system is established and a common data access interface will be defined. To promote EGEE as a climate data handling (and postprocessing) infrastructure based on these developments we propose a stepwise approach: - establishment of an international standards based climate metadata catalog (e.g. based on AMGA plus a common push/pull metadata exchange to grid external metadata catalogues via established metadata harvesting protocols - establishment of data access to (initially free) climate datasets in climate data centers: As intial starting point we need an easy way to access data in climate data centers and copy/register them on grid storage, e.g. by using proprietary access clients or OGSA-DAI. - adaptation of commonly used climate data processing toolkits on EGEE such as e.g. cdo [3] [1] http://www.c3grid.de [2] http://www.d-grid.de [3] http://www.mpimet.mpg.de/~cdo/ [4] Stephan Kindermann, EGEE infrastructure and Grids for Earth Sciences and Climate Research, Technical report DKRZ (available under http://c3grid.dkrz.de/moin.cgi/PublicDocs)
    
    Speaker: Dr Joachim Biercamp (DKRZ)
    
    Slides
  - 18:00
    
    Discussion 15m
Thursday, 2 March
- 12:30 → 14:00
  
  Lunch 1h 30m
Friday, 3 March
- 13:00 → 14:00
  
  Lunch 1h

Choose timezone

EGEE User Forum

CERN

40-SS-D01

CERN

Share this page

Direct link

Social networks

Calendaring