EGEE User Forum

Europe/Zurich
CERN

CERN

Description

The EGEE (Enabling Grids for E-sciencE) project provides the largest production grid infrastructure for applications. In the first two years of the project an increasing number of diverse users communities have been attracted by the possibilities offered by EGEE and have joined the initial user communities. The EGEE user community feels it is now appropriate to meet to share their experiences, and to set new targets for the future, including both the evolution of the existing applications and the development and deployment of new applications onto the EGEE infrastructure.

The EGEE Users Forum will provide an important opportunity for innovative applications to establish contacts with EGEE and with other user communities, to plan for the future usage of the EGEE grid infrastructure, to learn about the latest advances, and to discuss the future evolution in the grid middleware. The main goal is to create a dynamic user community, starting from the base of existing users, which can increase the effectiveness of the current EGEE applications and promote the fast and efficient uptake of grid technology by new disciplines. EGEE fosters pioneering usage of its infrastructure by encouraging collaboration between diverse scientific disciplines. It does this to evolve and to expand the services offered to the EGEE user community, maximising the scientific, technological and economical relevance of grid-based activities.

We would like to invite hands-on users of the EGEE Grid Infrastructure to Submit an Abstract for this event following the suggested template.

Participants
• Alastair Duncan
• Alberto Falzone
• Alberto Ribon
• Ales Krenek
• Alessandro Comunian
• Alexandru Tudose
• Alexey Poyda
• Algimantas Juozapavicius
• Alistair Mills
• Alvaro del Castillo San Felix
• Andrea Barisani
• Andrea Caltroni
• Andrea Ferraro
• Andrea Manzi
• Andrea Rodolico
• Andrea Sciabà
• Andreas Gisel
• Andreas-Joachim Peters
• Andrew Maier
• Andrey Kiryanov
• Aneta Karaivanova
• Antonio Almeida
• Antonio De la Fuente
• Antonio Laganà
• Antony wilson
• Arnaud PIERSON
• Arnold Meijster
• Benjamin Gaidioz
• Beppe Ugolotti
• Birger Koblitz
• Bjorn Engsig
• Bob Jones
• Boon Low
• Catalin Cirstoiu
• Cecile Germain-Renaud
• Charles Loomis
• CHOLLET Frédérique
• Christian Saguez
• Christoph Langguth
• Christophe Blanchet
• Christophe Pera
• Claudio Arlandini
• Claudio Grandi
• Claudio Vella
• Claudio Vuerli
• Claus Jacobs
• Craig Munro
• Cristian Dittamo
• Cyril L'Orphelin
• Daniel JOUVENOT
• Daniel Lagrava
• Daniel Rodrigues
• David Colling
• David Fergusson
• David Horn
• David Smith
• David Weissenbach
• Davide Bernardini
• Dezso Horvath
• Dieter Kranzlmüller
• Dietrich Liko
• Dmitry Mishin
• Doina Banciu
• Domenico Vicinanza
• Dominique Hausser
• Eike Jessen
• Elena Slabospitskaya
• Elena Tikhonenko
• Elisabetta Ronchieri
• Emanouil Atanassov
• Eric Yen
• Erwin Laure
• Esther Acción García
• Ezio Corso
• Fabrice Bellet
• Fabrizio Pacini
• Federica Fanzago
• Fernando Felix-Redondo
• Flavia Donno
• Florian Urmetzer
• Florida Estrella
• Fokke Dijkstra
• Fotis Georgatos
• Fotis Karayannis
• Francesco Giacomini
• Francisco Casatejón
• Frank Harris
• Frederic Hemmer
• Gael youinou
• Gaetano Maron
• Gavin McCance
• Gergely Sipos
• Giorgio Maggi
• Giorgio Pauletto
• giovanna stancanelli
• Giuliano Pelfer
• Giuliano Taffoni
• Giuseppe Andronico
• Giuseppe Codispoti
• Hannah Cumming
• Hannelore Hammerle
• Hans Gankema
• Harald Kornmayer
• Horst Schwichtenberg
• Huard Helene
• Hugues BENOIT-CATTIN
• Hurng-Chun LEE
• Ian Bird
• Ignacio Blanquer
• Ilyin Slava
• Iosif Legrand
• Isabel Campos Plasencia
• Isabelle Magnin
• Jacq Florence
• Jakub Moscicki
• Jan Kmunicek
• Jan Svec
• Jaouher KERROU
• Jean Salzemann
• Jean-Pierre Prost
• Jeremy Coles
• Jiri Kosina
• Joachim Biercamp
• Johan Montagnat
• John Walk
• John White
• Jose Antonio Coarasa Perez
• José Luis Vazquez
• Juha Herrala
• Julia Andreeva
• Kerstin Ronneberger
• Kiril Boyanov
• Kiril Boyanov
• Konstantin Skaburskas
• Laura Cristiana Voicu
• Laura Perini
• Leonardo Arteconi
• Livia Torterolo
• Luciano Milanesi
• Ludek Matyska
• Lukasz Skital
• Luke Dickens
• Malcolm Atkinson
• Marc-Elian Bégin
• Marcel Kunze
• Marcin Plociennik
• Marco Cecchi
• Mariusz Sterzel
• Marko Krznaric
• Markus Schulz
• Martin Antony Walker
• Massimo Lamanna
• Massimo Marino
• Miguel Cárdenas Montes
• Mike Mineter
• Mikhail Zhizhin
• Mircea Nicolae Tugulea
• Monique Petitdidier
• Muriel Gougerot
• Nick Brook
• Nicolas Jacq
• Nicolas Ray
• Nils Buss
• Nuno Santos
• Osvaldo Gervasi
• Othmane Bouhali
• Owen Appleton
• Pablo Saiz
• Panagiotis Louridas
• Pasquale Pagano
• Patricia Mendez Lorenzo
• Pawel Wolniewicz
• Peter Kacsuk
• Peter Praxmarer
• Philippa Strange
• Philippe Renard
• Pier Giovanni Pelfer
• Pietro Lio
• Pietro Liò
• Rafael Leiva
• Remi Mollon
• Ricardo Brito da Rocha
• Riccardo di Meo
• Robert Cohen
• Roberta Faggian Marque
• Roberto Barbera
• Roberto Santinelli
• Rolandas Naujikas
• Rolf Kubli
• Rolf Rumler
• Romier Genevieve
• Rosanna Catania
• Sabine ELLES
• Sandor Suhai
• Sergio Andreozzi
• Sergio Fantinel
• Shkelzen RUGOVAC
• Silvano Paoli
• Simon Lin
• Simone Campana
• Stefano Beco
• Stefano Cozzini
• Stella Shen
• Stephan Kindermann
• Steve Fisher
• tao-sheng CHEN
• Texier Romain
• Toan Nguyen
• Todor Gurov
• Tomasz Szepieniec
• Tony Calanducci
• Torsten Antoni
• tristan glatard
• Valentin Vidic
• Valerio Venturi
• Vangelis Floros
• Vaso Kotroni
• Venicio Duic
• Vicente Hernandez
• Victor Lakhno
• Viet Tran
• Vincent Breton
• Vincent LEFORT
• Wei-Long Ueng
• Ying-Ta Wu
• Yury Ryabov
• Ákos Frohner
• Wednesday, March 1
• User Forum Plenary 1 500/1-001 - Main Auditorium

500/1-001 - Main Auditorium

CERN

400
Show room on map
• 9:30 AM
Registration and coffee
• 1
Welcome
Speaker: Frederic Hemmer
• 2
Setting the scene
Speaker: Bob Jones (CERN)
• 3
The Grid and the Biomedical community: achievements and open issues
Speaker: Isabelle Magnin (INSERM Lyon)
• 4
The Grid and the LHC experiments: achievements and open issues
Speaker: Nick Brook (CERN and Bristol University)
• 5
Experience integrating new applications in EGEE
Speaker: Roberto Barbera (University of Catania and INFN)
• 1:00 PM
Lunch
• 1a: Life Sciences 40-SS-C01

40-SS-C01

CERN

• 6
GPS@: Bioinformatics grid portal for protein sequence analysis on EGEE grid
One of current major challenges in the bioinformatics field is to derive valuable information from the complete genome sequencing projects, which provide the bioinformatics community with a large number of unknown sequences. The first prerequisite step in this process is to access up-to-date sequence and 3D-structure databanks (EMBL, GenBank, SWISS-PROT, Protein Data Bank...) maintained by several bio-computing centres (NCBI, EBI, EMBL, SIB, INFOBIOGEN, PBIL, …). For efficiency reasons, sequences should be analyzed using the maximal number of methods on a minimal number of different Web sites. To achieve this, we developed a Web server called NPS@ [1] (Network Protein Sequence Analysis) that provides biologists with many of the most common tools for protein sequence analysis through a classic Web browser like Netscape, or through a networked protein client software like MPSA [2]. Today, the genomic and post-genomic web portals available have to deal with their local cpu and storage resources. That’s why, most of the time, the portal administrators put some restrictions on the methods and databanks available. Grid computing [3], as in the European EGEE project [4], will be a viable solution to foresee these limitations and to bring computing resources suitable to the genomic research field. Nevertheless, the current job submission process on the EGEE platform is relatively complex and unsuitable for automation. The user has to install an EGEE user interface machine on a Linux computer (or to ask for a account on a public one), to remotely log on it, to init manually a certificate proxy for authentication reasons, to specify the job arguments to the grid middleware using the Job Description Language (JDL) and then to submit the job through a command line interface. Next, the grid-user has to check periodically the resource broker for the status of his job: “Submitted", "Ready", “Scheduled”, “Running”, etc. until the “Done” status. As a final command, he has to get his results with a raw file transfer from the remote storage area to his local file system. This mechanism is most of times off-putting scientist that are not aware of advanced computing techniques. Thus, we decide to provide biologists with a user-friendly interface for the EGEE computing and storage resources, by adapting our NPS@ web site. We have called this new portal GPS@ for “Grid Protein Sequence Analysis”, and it can be reached online at http://gpsa.ibcp.fr, yet for experimental tests only. In GPS@, we simplify the grid analysis query: GPS@ Web portal runs its own EGEE low-level interface and provides biologists with the same interface that they are using daily in NPS@. They only have to paste their protein sequences or patterns into the corresponding field of the submission web page. Then simply pressing the “submit” button launches the execution of these jobs on the EGEE platform. All the EGEE job submission is encapsulated into the GPS@ back office: scheduling and status of the submitted jobs. And finally the result of the bioinformatics jobs are displayed into a new Web page, ready for other analyses or for results download in the appropriate data format. [1] NPS@: Network Protein Sequence Analysis. Combet C., Blanchet C., Geourjon C. et Deléage G. Tibs, 2000, 25, 147-150. [2] MPSA: Integrated System for Multiple Protein Sequence Analysis with client/server capabilities. Blanchet C., Combet C., Geourjon C. et Deléage G. Bioinformatics, 2000, 16, 286-287. [3] Foster, I. And Kesselman, C. (eds.) : The Grid 2 : Blueprint for a New Computing Infrastructure, (2004). [4] Enabling Grid for E-sciencE (EGEE), online at www.eu-egee.org
Speakers: Dr Christophe Blanchet (CNRS IBCP), Mr Vincent Lefort (CNRS IBCP)
• 7
Encrypted File System on the EGEE grid applied to Protein Sequence Analysis
Speakers: Dr Christophe Blanchet (CNRS IBCP), Mr Rémi Mollon (CNRS IBCP)
• 8
BIOINFOGRID: Bioinformatics Grid Application for life science
Project descriptions The European Commission promotes the Bioinformatics Grid Application for life science (BIOINFOGRID) project. The BIOINFOGRID project web site will be available at http://www.itb.cnr.it/bioinfogrid. The project aims to connect many European computer centres in order to carry out Bioinformatics research and to develop new applications in the sector using a network of services based on futuristic Grid networking technology that represents the natural evolution of the Web. More specifically the BIOINFOGRID project will make research in the fields of Genomics, Proteomics, Transcriptomics and applications in Molecular Dynamics much easier, reducing data calculation times thanks to the distribution of the calculation at any one time on thousands of computers across Europe and the world. Furthermore it will provide the possibility of accessing many different databases and hundreds of applications belonging to thousands of European users by exploiting the potential of the Grid infrastructure created with the EGEE European project and coordinated by CERN in Geneva. The BIOINFOGRID project foresees an investment of over one million euros funded through the European Commission’s “Research Infrastructures” budget. Grid networking promises to be a very important step forward in the Information Technology field. Grid technology will make a global network made up of hundreds of thousands of interconnected computers possible, allowing the shared use of calculating power, data storage and structured compression of data. This goes beyond the simple communication between computers and aims instead to transform the global network of computers into a vast joint computational resource. Grid technology is a very important step forward from the Web, that simply allows the sharing of information over the internet. The massive potential of Grid technology will be indispensable when dealing with both the complexity of models and the enormous quantity of data, for example, in searching the human genome or when carry out simulations of molecular dynamics for the study of new drugs. The grid collaborative and application aspects. The BIOINFOGRID projects proposes to combine the Bioinformatics services and applications for molecular biology users with the Grid Infrastructure created by EGEE (6th Framework Program). In the BIOINFOGRID initiative we plan to evaluate genomics, transcriptomics, proteomics and molecular dynamics applications studies based on GRID technology. Genomics Applications in GRID • Analysis of the W3H task system for GRID. • GRID analysis of cDNA data. • GRID analysis of the NCBI and Ensembl databases. • GRID analysis of rule-based multiple alignments. Proteomics Applications in GRID • Pipeline analysis for domain search for protein functional domain analysis. • Surface proteins analysis in GRID platform. Transcriptomics and Phylogenetics Applications in GRID • Data analysis specific for microarray and allow the GRID user to store and search this information, with direct access to the data files stored on Data Storage element on GRID servers. • To validate an infrastructure to perform Application of Phylogenetic based on execution application of Phylogenetic methods estimates trees. Database and Functional Genomics Applications • To offer the possibility to manage and access biological database by using the GRID EGEE. • To cluster gene products by their functionality as an alternative to the normally used comparison by sequence similarity. Molecular Dynamics Applications • To improve the scalability of Molecular Dynamics simulations. • To perform simulation folding and aggregation of peptides and small proteins, to investigate structural properties of proteins and protein-DNA complexes and to study the effect of mutations in proteins of biomedical interest. • To perform a challenge of the Wide In Silico Docking On Malaria. EGEE and EGEEII future plan BIOINFOGRID will evaluate the Grid usability in wide variety of applications, the aim to build a strong and unite BIONFOGRID Community and explore and exploit common solutions. The BIOINFOGRID collaboration will be able to establish a very large user group in Bioinformatics in EUROPE. This cooperation will be able to promote the Bioinformatics and GRID applications in EGEE and EGEEII. The aim of the BIOINFOGRID project is to bridge the gap, letting people from the bioinformatics and life science be aware of the power of Grid computing just trying to use it. We intend to pursue this goal by using a number of key bioinformatics applications and getting them run onto the European Grid Infrastructure. The most natural and important spin off of the BIOINFOGRID project will then be a strong dissemination action within the user’s communities and across them. In fact, from one side application’s experts will meet Grid experts and will learn how to re- engineer and adapt their applications to “run on the Grid” and, from the other side (and at the same time), application’s experts will meet other-applications’ experts with a high probability that ones’ expertises can be exploited as others’ solutions. The BIOINFOGRID project will provide the EGEEII with very useful inputs and feedbacks on the goodness and efficiency of the structure deployed and on the usefulness and effectiveness of the Grid services made available at the continental scale. In fact, having several bioinformatics scientific applications using these Grid services is a key moment to stress the generality of the services themselves.
Speaker: Dr Luciano Milanesi (National Research Council - Institute of Biomedical Technologies)
• 9
BioDCV: a grid-enabled complete validation setup for functional profiling
Abstract BioDCV is a distributed computing system for the complete validation of gene profiles. The system is composed of a suite of software modules that allows the definition, management and analysis of a complete experiment on DNA microarray data. The BioDCV system is grid-enabled on LCG/EGEE middleware in order to build predictive classification models and to extract the most important genes on large scale molecular oncology studies. Performances are evaluated on a set of 6 cancer microarray datasets of different sizes and complexity, and then compared with results obtained on a standard Linux cluster facility. Introduction The scientific objective of BioDCV is a large scale comparison of prognostic gene signatures from cancer microarray datasets realized by a complete validation system and run in Grid. The models will constitute a reference experimental landscape for new studies. Outcomes of BioDCV consist of a predictive model, the straighforward evaluation of its accuracy, the lists of genes ranked for importance, the identification of patient subtypes. Molecular oncologists from medical research centers and collaborating bioinformaticians are currently the target end-users of BioDCV. The comparisons presented in this paper demonstrate the factibility of this approach on public data as well as on original microarray data from IFOM-Firc. The complete validation schema developed in our system involves an intensive replication of a basic classification task on resampled versions of the dataset. About 5x105 base models are developed, which may become 2x106 if the experiment is replicated with randomized output labels. The scheme must ensure that no selection bias effect is contaminating the experiment. The cost of this caution is high computational complexity. Porting to the Grid To guarantee fast, slim and robust code, and relational access to data and a model descriptions, BioDCV was written in C and interfaced with SQLite (http://www.sqlite.org), a database engine which supports concurrent access and transactions useful in a distributed environment where a dateset may be replicated for up to a few million models. In this paper, we present the porting of our application to grid systems, namely the Egrid (http://www.egrid.it) computational grids. The Egrid infrastructure is based on Globus/EDG/LCG2 middleware and is integrated as an independent virtual organization within Grid.it, the INFN production grid. The porting requires just two wrappers, one shell script to submit jobs and one C MPI program. When the user submits a BioDCV job to the grid, the grid middleware looks for the CE (Computing Element: where user tasks are delivered) and the WNs (Worker Nodes: machines where the grid user programs are actually executed) require to run the parallel program. As soon as the resources (CPUs in WNs) are available, the shell script wrapper is executed on the assigned CE. This script distributes the microarray dataset from the SE (Storage Element stores user data in the grid) to all the involved WNs. It then starts the C MPI wrapper which spawns several instances of the BioDCV program itself. When all BioDCV instances are completed, the wrapper copies all outputs including model and diagnostic data from the WNs to the starting SE. Finally, the process outputs are returned, thus allowing the reconstruction of a complete data archive for the study. Experiments and results Two experiments were designed to measure the performance of the BioDVC parallel application in two different computing available environments: a standard Linux cluster and a computational grid. In Benchmark 1, we study the scalability of our application as a function of the number of CPUs. The benchmark is executed on a Linux clusters formed by 8 Xeon 3.0 CPUs and on the EGEE grid infrastructure ranging from 1 to 64 Xeon CPUs. Two DNA microarray datasets are considered: LiverCanc (213 samples, ATAC-PCR, 1993 genes) and PedLeuk (327 samples, Affymetrix, 12625 genes). On both dataset we obtain a speed-up curve very close to linear. The speed-up factor for n CPUs is defined as the user time for one CPU divided by the user time for n CPUs. In Benchmark 2, we characterize the BioDCV application different d (number of features) and N (number of samples) values for a complete validation experiment, and we execute a task for each dataset on the EGEE grid infrastructure using a fixed number of CPUs. The benchmark was run on a suite of six microarray datasets: LiverCanc, PedLeuk, BRCA (62 samples, cDNA, 4000 genes), Sarcoma (35 samples, cDNA, 7143 genes), Wang (286 samples, Affymetrix, 17816 genes), Chang (295 samples, cDNA, 25000 genes). It can be observed that effective execution time (total execution time without queueing time at grid site) increases linearly with the dataset footprint, i.e. the product of number of genes and number of samples. The performance penalty payed with respect to a standard parallel run performed on local cluster is limited and it is mainly due to data transfer from user machine to grid site and between WNs. Discussion and Conclusions The two experiments, which sum up to 139 CPU days within the Egrid infrastructure, implicate that general behavior of the BioDCV system on LCG/EGEE computational grids can be used in practical large scale experiments. The overall effort for gridification was limited to three months. We will investigate if substituting a model of one single job asking for N CPUs (MPI approach) with a model that submits N different single CPU jobs can overcome some limitations. Next step is porting our system under EGEE's Biomed VO. BioDCV is an open source application and it is currently distributed under GPL (SubVersion repository at http://biodcv.itc.it).
Speaker: Silvano Paoli (ITC-irst)
• 10
Application of GRID resource for modeling charge transfer in DNA
Recently, at the interface of physics, chemistry and biology, a new and rapidly developing research trend has emerged concerned with charge transfer in biomacromolecules. Of special interest to researchers is the electron and hole transfer along a chain of base pairs, since the migration of radicals over a DNA molecule plays a crucial role in the processes of mutagenesis and carcinogenesis. Moreover, understanding the mechanism of charge transfer is necessary for the development of a new field, concerned with charge transfer in organic conductors and their possible application in computing technology. To use biomolecules as conductors, one should know the rate of charge mobility. We calculate theoretical values of charge mobility on the basis of a quantum- classical model of charge transfer in various synthesized polynucleotides at varying temperature T of the environment. To take into account temperature fluctuations, a random force with specified statistical characteristics was added in the classical equations of site motion (Langevin force). (See e.g.: V.D.Lakhno, N.S.Fialko. Hole mobility in a homogeneous nucleotide chain // JETP Letters, 2003, v.78 (5), pp.336- 338; V.D.Lakhno, N.S.Fialko. Bloch oscillations in a homogeneous nucleotide chain // Pisma v ZhETF, 2004, v.79 (10), pp.575-578). As is known, the results of most biophysical experiments are averaged (for example, in our case, over a great many DNA fragments in a solution) values of macroscopic physical parameters. When modeling charge transfer in a DNA at finite temperature, calculations should be carried out for a great many realizations so that to find average values of macroscopic physical parameters. This formulation of the problem enables paralleling of the program by realizations such as “one processor – one realization”. A sequential algorithm is used for individual realizations. Initial values of site velocities and displacements are preset randomly from the requirement of equilibrium distribution at a given temperature. In calculating individual realizations, at each step a random number with specified characteristics is generated for the Langevin term. To make the problem of modeling of the charge transfer in a given DNA sequence at a prescribed temperature suitable to be calculated using GRID resource, the original program was divided into 2 parts. The first program calculates one realization for given parameters. At the input it receives files with parameters and initial data. The peculiarity of the task is that we are interested in dynamics of charge transfer, so at the program output we get several dozens Mb results. Using a special script, 100-150 copies of the program run with the same parameters and random initial data. Upon completion of the computations, the files of results are compressed and transmitted to a predefined SE. When an appropriate number of realizations is calculated, the second program runs once. It must calculate average values for charge probabilities, for site displacements from the equilibrium, etc. A special script is sent to calculate this program on WN. This WN takes from SE files with results of realizations in series of 10 items. For each series the averaging program runs (at the output one gets the data averaged over 10 realizations). If the output file of a current realization is absent or defective, it is ignored, and the next output file is taken. The files obtained are processed by this averaging program again. This makes our results independent of chance failures in calculations of individual realizations. Using GRID resource by this method, we have carried out calculations of the hole mobility at different temperatures in the range from 10 to 300 K for (GG) and (GC) polynucleotide sequences (several thousands realizations).
Speaker: Ms Nadezhda Fialko (research fellow)
• 11
A service to update and replicate biological databases
Speaker: Mr Jean Salzemann (IN2P3/CNRS)
• 3:30 PM
Questions and discussion

Questions and Discussion

• 4:00 PM
COFFEE

COFFEE

• 12
Using Grid Computation to Accelerate Structure-based Design Against Influenza A Neuraminidases
Speaker: Dr Ying-Ta Wu (Academia Sinica Genomic Research Center)
• 13
In silico docking on EGEE infrastructure: the case of WISDOM
Advance in combinatorial chemistry has paved the way for synthesizing large numbers of diverse chemical compounds. Thus there are millions of chemical compounds available in the laboratories, but it is nearly impossible and very expensive to screen such a high number of compounds in the experimental laboratories by high throughput screening (HTS). Besides the high costs, the hit rate in HTS is quite low, about 10 to 100 per 100,000 compounds when screened on targets such as enzymes. An alternative is high throughput virtual screening by molecular docking, a technique which can screen millions of compounds rapidly, reliably and cost effectively. Screening millions of chemical compounds in silico is a complex process. Screening each compound, depending on structural complexity, can take from a few minutes to hours on a standard PC, which means screening all compounds in a single database can take years. Computation time can be reduced very significantly with a large grid gathering thousands of computers. WISDOM (World-wide In Silico Docking On Malaria) is an European initiative to enable the in silico drug discovery pipeline on a grid infrastructure. Initiated and implemented by Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) in Germany and the Corpuscular Physics Laboratory (CNRS/IN2P3) of Clermont- Ferrand in France, WISDOM has deployed a large scale docking experiment on the EGEE infrastructure. Three goals motivated this first experiment. The biological goal was to propose new inhibitors for a family of proteins produced by Plasmodium falciparum. The biomedical informatics goal was the deployment of in silico virtual docking on a grid infrastructure. The grid goal is the deployment of a CPU consuming application generating large data flows to test the grid operation and services. Relevant information can be found on http://wisdom.eu-egee.fr and http://public.eu-egee.org/files/battles-malaria-grid-wisdom.pdf. With the help of the grid, large scale in silico experimentation is possible. Large resources are needed in order to test in a transparent way a family of targets, a large enough amount of possible drug candidates and different virtual screening tools with different parameter / scoring settings. The grid added value lies not only in the computing resources made available, but also already in the permanent storage of the data with a transparent and secure access. Reliable Workload Manager System, Information Service and Data Management Services are absolutely necessary for a large scale process. Accounting, security and license management services are also essential to impact the pharmaceutical community. In a close future, we expect improved data management middleware services to allow automatic update of compound database and the design of a grid knowledge space where biologists can analyze output data. Finally key issues to promote the grid in the pharmaceutical community include cost and time reduction in a drug discovery development, security and data protection, fault tolerant and robust services and infrastructure, and transparent and easy use of the interfaces. The first biomedical data challenge ran on the EGEE grid production service from 11 July 2005 until 19 August 2005. The challenge saw over 46 million docked ligands, the equivalent of 80 years on a single PC, in about 6 weeks. Usually in silico docking is carried out on classical computer clusters resulting in around 100,000 docked ligands. This type of scientific challenge would not be possible without the grid infrastructure - 1700 computers were simultaneously used in 15 countries around the world. The WISDOM data challenge demonstrated how grid computing can help drug discovery research by speeding up the whole process and reduce the cost to develop new drugs to treat diseases such as malaria. The sheer amount of data generated indicates the potential benefits of grid computing for drug discovery and indeed, other life science applications. Commercial software with a server license was successfully deployed on more than 1000 machines in the same time. First docking results show that 10% of the compounds of the database studied may be hits. Top scoring compounds possess basic chemical groups like thiourea, guanidino, amino-acrolein core structure. Identified compounds are non peptidic and low molecular weight compounds. Future plans for the WISDOM initiative is first to process the hits again with molecular dynamics simulations. A WISDOM demonstration will be conceived at the aim to show the submission of docking jobs on the grid at a large scale. A second data challenge planned for the fall of 2006 is also under preparation to improve the quality of service and the quality of usage of the data challenge process on gLite.
Speaker: Mr Nicolas Jacq (CNRS/IN2P3)
• 14
Early Diagnosis of Alzheimer’s Disease Using a Grid Implementation of Statistical Parametric Mapping Analysis
A voxel based statistical analysis of perfusional medical images may provide powerful support to the early diagnosis for Alzheimer’s Disease (AD). A Statistical Parametric Mapping algorithm (SPM), based on the comparison of the candidate with normal cases, has been validated by the neurological research community to quantify ipometabolic patterns in brain PET/SPECT studies. Since suitable “normal patient” PET/SPECT images are rare and usually sparse and scattered across hospitals and research institutions, the Data Grid distributed analysis paradigm (“move code rather than input data”) is well suited for implementing a remote statistical analysis use case, described as follow.
Speaker: Mrs Livia Torterolo (Bio-Lab, DIST, University of Genoa)
• 15
SIMRI@Web : An MRI Simulation Web Portal on EGEE Grid Architecture
In this paper, we present a web protal that enables simulation of MRI images on the grid. Such simulations are done using the SIMRI MRI simulator that is implemented on the grid using MPI. MRI simulations are useful for better understanding the MRI physics, for studying MRI sequences (parameterisation), and validating image processing algorithms. The web portal client/server architecture is mainly based on a java thread that screens a data base of simulation jobs. The thread submits the new jobs to the grid, and updates the status of the running jobs. When a job is terminated, the thread sends the simulated image to the user. Through a client web interface, the user can submit new simulation jobs, get a detailed status of the running jobs, have the history of all the terminated jobs as well as their status and corresponding simulated image. As MRI simulation is computationally very expensive, grid technologies appear to a real added value for the MRI simulation task. Nevertheless the grid access should be simplified to enable final user running MRI simulations. That is why we develop a tis specific web portal to propose a user friendly interface for MRI simulation on the grid.
Speaker: Prof. Hugues BENOIT-CATTIN (CREATIS - UMR CNRS 5515 - U630 Inserm)
• 16
Application of the Grid to Pharmacokinetic Modelling of Contrast Agents in Abdominal Imaging
The liver is the largest organ of the abdomen and there are a large number of lesions affecting it. Both benign and malignant tumours arise within it. The liver is also the target organ for most solid tumours metastasis. Angiogenesis is quite an important marker of tumour aggressiveness and response to therapy. The blood supply to the liver is derived jointly from the hepatic arteries and the portal venous system. Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) is extensively used for the detection of primary and metastatic hepatic tumours. However, the assessment of early stages of the malignancy and other diseases like cirrhosis require the quantitative evaluation of the hepatic arterial supply. To achieve this goal, it is important to develop precise pharmacokinetic approaches to the analysis of the hepatic perfusion. The influence of breathing, the large number of pharmacokinetic parameters and the fast variations in contrast concentration in the first moments after contrast injection reduce the efficiency of traditional approaches. On the other hand, the traditional radiological analysis requires the acquisition of images covering the whole liver, which greatly reduces the time resolution for the pharmacokinetic curves. The combination of all these adverse factors makes very challenging the analytical study of liver DCE-MRI data. The final objective of the work we present here is to provide the users with a tool to optimally select the parameters that describe the farmacokinetic model of the liver. This tool will use the Grid as a source of computing power and will offer a simply and user-friendly interface. The tool enables the execution of large sets of co-registration actions varying the values of the different parameters, easing the process of transferring the source data and the results. Since Grid concept is mainly batch (and the co-registration is not an interactive process due to its long duration), it must provide with a simply way to monitor the status of the processing. Finally the process must be achieved in the shorter time possible, considering the resources available.
Speaker: Dr Ignacio Blanquer (Universidad Politécnica de Valencia)
• 17
Construction of a Mathematical Model of a Cell as a Challenge for Science in the 21 Century and EGEE project
As recently as a few years ago a possibility of constructing a mathematical model of a life seemed absolutely fantastic. However, at the beginning of 21-th century several research teams announced creation of a minimum model of life. To be more specific, not life in general, but an elementary brick of life, that is a living cell. The most well-known of them are: USA Virtual Cell Project (V-Cell), NIH (http: //www. nrcam.uchc.edu /vcellR3 /login/login.jsp); Japanese E-cell project (http://ecell. sourceforge.net/); Dutch project ViC (Virtual Cell) (http://www.bio.vu.nl /hwconf/Silicon /index.html). The above projects deal mainly with kinetics of cell processes. New approaches to modeling imply development of imitation models to simulate functioning of cell mechanisms and devising of software to simulate a complex of interrelated and interdependent processes (such as gene networks). With the emergence of an opportunity to use GRID infrastructure for solving such problems new and bright prospects have opened up. To develop an integrated model of more complex object than prokaryotic cell such as eukaryotic cell is the aim of the Mathematical Cell project (http://www.mathcell.ru) realized at the Joint Center for Computational Biology and Bioinformatics (www.jcbi.ru) of the IMPB RAS. Functioning of a cell is simulated based on the belief that the cell life is mainly determined by the processes of charge transfer in all its constituent elements. Since (like in physics where the universe is thought to have arisen as a result of a Big Bang) life originated from a DNA molecule, modeling should be started from the DNA. The MathCell model repository includes software to calculate charge transfer in an arbitrary nucleotide sequence of a DNA molecule. A sequence to be analyzed may be specified by a user or taken from databanks presented at the site of the Joint Center for Computational Biology and Bioinformatics (http://www.jcbi.ru). Presently, the MathCell site demonstrates a simplest model of charge transfer. In the framework of the GRID EGEE project any user registered and certified in EGEE infrastructure can use both the program and the computational resources offered by EGEE. In the near future IMPB RAS is planning to deploy in EGEE a software tool to calculate a charge transfer on inner membranes of some compartments of eukaryotic cells (mitochondria and chloroplasts) through direct simulation of charge transfer with regard to the detailed structure of biomembranes containing various molecular complexes. Next on the agenda is a software tool to calculate metabolic reaction pathways in compartments of a cell as well as the dynamics of gene networks. Further development of the MathCell project implies integration of individual components of the model into an integrated program system which would enable modeling of cell processes at all levels – from microscopic to macroscopic scales and from picoseconds to the scales comparable with the cell lifetime. Such modeling will naturally require combining of computational and commutation resources provided by EGEE project and their merging into an integrated computational medium.
Speaker: Prof. Victor Lakhno (IMPB RAS, Russia)
• 6:00 PM
Wind-up questions and discussion
• 1b: Astrophysics/Astroparticle physics - Fusion - High-Energy physics 40/5-A01

40/5-A01

CERN

45
Show room on map

Brings together 3 major scientific communities using EGEE for large scale computation and data sharing

• 18
Benefits of the MAGIC Grid
Application context and scientific goals ======================================== The field of gamma-ray observations in the energy range between 10 GeV and 10 TeV developed fast over the last decade. From the first observation of TeV gamma rays from the Crab nebula using the atmospheric Cerenkov imaging technique in 1989 [1] to the discovery of new gamma ray sources with the new generation telescopes like the HESS observation of a high-energy particle acceleration in the shell of a supernova remnant [2], a new observation window to the universe was opened. In the future other ground based VHE $\gamma$-ray observatories (namely MAGIC [3], VERITAS [4] and KANGAROO [5]) will significantly contribute to the exploitation of this new observation window. With the new generation Cerenkov telescopes the requirements for the analysis and Monte Carlo production computing infrastructure will increase due to a higher number of camera pixels, faster FADC systems and a bigger mirror size. In the future the impact of VHE gamma-ray astronomy will increase by joined observations of different Cerenkov telescopes. In 2003 the national Grid centers in Italy (CNAF), Spain (PIC) and Germany (GridKA) started together with the MAGIC collaboration an effort to build a distributed computing system for Monte Carlo generation and analysis on top of existing Grid infrastructure. The MAGIC telescope was chosen due to the following reasons: o The MAGIC collaboration is international, with most partners from Europe o main partners of the MAGIC telescope are located close to the national Grid centers o The generation of Monte Carlo data is very compute intensive, specially to get enough statistics in the low energy range. o The analysis of the fast increasing real data samples will be done in different institutes. The collaborators need a seamless access to the data while reducing the number of replicas to a minimum. o The MAGIC collaboration will build a second telescope in 2007 resulting in a doubled data rate. The idea of the MAGIC Grid [6] was presented to the EGEE Generic Application Advisory Panel (EGAAP). In June 2004 EGEE accepted the generation of Monte Carlo data for the MAGIC telescope as one of the generic applications of the project. Grid added value ================ By implementing the MAGIC Grid over the last two years, the MAGIC collaboration benefit in many aspects. These aspects are described in this chapter. o Collaboration of different institutes By combining the resources of the MAGIC collaborators and the reliable resources from the national Grid centers the MAGIC collaborators will be empowered to use their computing infrastructure more efficiently. The time to analyse the big amount of data to solve specific scientific problems will be shortend. o Cost reduction By using the EGEE infrastructure and the EGEE services the effort for MAGIC collaboration to build a distributed computing system for the Monte Carlo simulations was significantly reduced. o Speedup of Monte Carlo production As the MAGIC Monte Carlo System was build on top of the EGEE middleware the integration of new computing resources is very easy. By getting support from many different EGEE resource providers the production rate for the Monte Carlos can be increased very easily. o Persistent storage of observation data The MAGIC telescope will produce a lot of data in the future. These data are currently stored on local resources including disk systems and tape libraries. The MAGIC collaboration recognized that this effort is not negligible especially concerning man power. Therefore the observation data will be stored by the spanish Grid center PIC. o Data availability improvements By importing the observation data to the Grid, the MAGIC collaboration expect that the availablitly of data will be increased with the help of Grid data management methods like data replication, etc. As the main data services will be provided in the future by the national Grid centers instead of research university groups at universities, the overall data availablitly is expected to increase. o Cost reduction By using the EGEE infrastructure and the EGEE services the effort for MAGIC collaboration to build a distributed computing system for the Monte Carlo simulations was significantly reduced. Experiences with the EGEE infrastructure ======================================== The experiences of the developers during the different phases of the realisation of the MAGIC Monte Carlo production system on the EGEE Grid infrastructure are described in this chapter. As the MAGIC virtual organisation was accepted as one of the first generic EGEE application, the development process was influenced by general developments within the EGEE project too like changed in the middleware versions, etc. o Prototype implementation -------------------------- The migration of the compute intensive MMCS program from a local batch system to the Grid was done by the definition of a template JDL form. This template sends all needed input data together with the executable to the Grid. The resources are chosen by the resource broker. The automatic registration of the output file as a logical file on the Grid was not very reliable at the beginning, but improved to production level within the EGEE project duration. o Production MAGIC Grid system ------------------------------ The submission of many production system needed the implementation of a graphical user interface and a database for metadata. The graphical user interface was realised with the JAVA programming language. The execution of the LCG/gLite commands is wrapped in JAVA shell commands. A MySQL database holds the schema for the metadata. As mentioned above the "copy and register" process for the output file was not realiable enough an additional job status "DONE (data available)" was invented. With the help of the database, jobs that did not reach this job status within two days are resubmitted. The job data are keeped in a seperate database table to analyse them later. o Reliability of EGEE services ------------------------------ The general services like resource brokers, VO management tools and Grid user support was provided by the EGEE resources providers. The MAGIC Grid is setup on top of this services. A short report of the experiences with this production services will be given. Key issues for the future of Grid technology ============================================ The MAGIC collaboration is currently evaluating the EGEE Grid infrastructure as the backbone for a distributed computing system in the future including the data storage on Grid data centers like PIC. Furthermore the discussion with other projects like the HESS collaboration has started to move towards "Virtual Very High energetic Gamma ray observatory" [7]. The problems and challenges that needs to be solved on the track to a sustainable Grid infrastructure will be discussed from the user perspective References: [1] T. Weekes et al., The Astrophysical Journal, volume 342 (1989), p. 379 [2] F. A. Aharonian et al., Nature 432, 75 - 77 (04 November 2004) [3] E. Lorenz, 1995, Fruehsjahrtagung Deutsche Physikalische Gesellschaft, March 9-10 [4] T. Weekes et al., Astropart. Phys., 17, 221-243 (2002) [5] Enomoto, R. et al., Astropart. Phys. 16, 235-244 (2002) [6] H. Kornmayer et al., "A distributed, Grid-based analysis system for the MAGIC telescope", Proceedings of the CHEP Conference , Interlaken, Switzerland, 2004 [7] H. Kornmayer et al., "Towards a virtual observatory for high energetic gamma rays", Cherenkov 2005, Paris, 2005
Speaker: Dr Harald Kornmayer (FORSCHUNGSZENTRUM KARLSRUHE (FZK))
• 19
Status of Planck simulations application
Speaker: Dr Claudio Vuerli (INAF-SI)
• 20
FUSION ACTIVITIES IN THE GRID
The future Magnetic confinement Fusion energy research will be mainly based upon large international facilities with the participation of a lot of scientist belonging to different institutes. For instance, the large device ITER (International Tokamak Experimental Reactor) that will be built in Cadarache (France) is participated by six partners: Europe, Japan, USA, Russia, China, and Korea. India is presently involved in negotiations to join the project and Brazil is also considering the possibility of joining the project. Besides ITER, the Fusion community has a strong collaboration structure devoted both to the tokamak and the stellarator research. As a result of this structure, there exists a network of groups and Institutes that are sharing facilities and/or results obtained on those facilities. Magnetic Fusion facilities are constituted by large devices devoted to study Plasma Physics that produce a large amount of data to be analysed (the typical rhythm of data production is about 1 GBy/s for a conventional device that can reach 10 times larger value in ITER). The analysis and availability of those data is a key point for the scientific exploitation of those devices. Also, large computations are needed for understanding plasma Physics and developing new calculation methods that are very CPU time consuming. A part of this computation effort can be performed in a distributed way and Grid technologies are very suitable to perform those calculations. Several Plasma Physics applications are being envisaged for adapting into the grid, those that can be distributed in different processors. The first kind of applications is In particular, Monte Carlo codes are suitable and powerful tools to perform transport calculations , especially in those cases like the TJ-II stellarator that present radially extended ion orbits, which has strong influence on confinement: The fact that orbits are wide makes that ions perform large radial excursions during a collision time, which will enhance outward heat flux. The usual transport calculations based on local plasma characteristics that give local transport coefficients are not suitable for this kind of geometry in the long mean free path regime. The suitable way to estimate transport is to follow millions of individual particles that move in a background plasma and magnetic configuration. The interaction with other particles is simulated by a collision operator, which depends on density and temperature, and by a steady state electric field, caused by the unbalanced electron and ion fluxes. This tool will be also useful to take into account other kinetic effects on electron transport, like those related to heating and current drive. This transport tool is now working in a Supercomputer and is being prepared to be ported to the grid, where will run soon. The capability of performing massive kinetic transport calculations will allow us to explore transport properties in different heating conditions and collisionalities, as well as with different electric field profiles. Another application that requires distributed calculations is the massive ray tracing. The properties of microwave propagation and absorption are estimated in the geometrical optics (or WKB) approximation by simulating the microwave beam by a bunch of rays. Those rays are launched and followed inside the plasma and all the necessary quantities are estimated along ray trajectories. Since all the rays are independent, they can be calculated separately . The number of rays needed in a normal case is typically 100 or 200, and the time needed for every ray estimate is about 10-20 minutes. This approximation works when the waist of the beam is far from any critical layer in the plasma. Critical layers are those where mode conversion, absorption, or reflection of microwaves happens. When the waist of the beam is closed to critical layers, a much higher number of rays is needed to simulate the beam. The typical number can be of the order of 10000, which is high enough to make it necessary to run the application in the grid. Massive ray tracing calculations could also be useful to determine the optimum microwave launching position in a complex 3D device like a real stellarator. These two former applications require that a common file with stellarator geometry data is distributed in all the processors as well as individual files with the initial data of every ray and trajectory. Stellarator devices present different magnetic configurations with different confinement properties. It is necessary to look for the magnetic configuration that present the best confinement properties, considering the experimental knowledge of confinement and transport in stellarators. Therefore, stellarator optimization is a very important topic to design the future stellarators that have to play a role in Magnetic confinement fusion. The optimization procedure has to take into account a lot of criteria that are based on the previous stellarator experience: neoclassical transport properties, viscosity, stability, etc. A possible way to develop this procedure is to parametrize the plasma by the Fourier coefficients that describe the magnetic field. Every set of coefficients is considered as a different stellarator with different properties. The optimization procedure has to take into account the desired characteristics for a magnetic configuration to be suitable for an optimised stellarator. The optimization criteria are set through functions that take into account the properties that favour plasma confinement . Every case can be run in a separate node of the grid in order to explore the hundreds of parameters that are involved in the optimization. Presently, other applications are being considered to be run in the grid in order to solve efficiently some problems on Plasma Physics that are needed for the future magnetic confinement devices. For instance, transport analysis is a key instrument in Plasma Physics that gives the transport coefficients that fit the experimental data. Transport analysis is performed using transport codes on the real plasma discharges. A plasma confinement device can perform tens of thousands of discharges along its life and only a few of them are analysed. It would be possible to install a transport code in the grid that performs automatic transport analysis on the experimental shots. In this way, the dependence of local transport coefficients on plasma parameters like magnetic configuration, density, temperature, electric field, etc. can be extracted. And, finally the tokamak equilibrium code EDGE2D can be installed in the grid to obtain equilibrium parameters in the edge, which is basic to estimate the exact plasma position and the equilibrium properties in the plasma edge.
Speaker: Dr Francisco Castejon (CIEMAT)
• 21
Massive Ray Tracing in Fusion Plasmas on EGEE
• 4:00 PM
break

COFFEE

• 22
Genetic Stellarator Optimisation in Grid
Speaker: Mr Vladimir Voznesensky (Nuclear Fusion Inst., RRC "Kurchatov Inst.")
• 23
Experiences on Grid production for Geant4
Geant4 is a general purpose toolkit for simulating the tracking and interaction of particles through matter. It is currently used in production in several particle physics experiments (BaBar, HARP, ATLAS, CMS, LHCb), and it has also applications in other areas, as space science, medical applications, and radiation studies. The complexity of the Geant4 code requires careful testing of all of its components, especially before major releases (which happens twice a year, in June and December). In this talk, I will describe the recent development of an automatic suite for testing hadronic physics in high energy calorimetry applications. The idea is to use a simplified set of hadronic calorimeters, with different beam particle types, and various beam energies, and comparing relevant observables between a given reference version of Geant4 and the new candidate one. Only those distributions that are statistically incompatible are then printed out and finally inspected by a person to look for possible bugs. The suite is made of Python scripts, and utilizes the "Statistical Toolkit" for the statistical tests between pair of distributions, and runs on the Grid to cope with the large amount of CPU needed in a short period of time. In fact, the total CPU time required for each of these Geant4 release validation productions amounts to about 4 CPU-years, which have to be concentrated in a couple of weeks. Therefore, the Grid environment is the natural candidate to perform this validation production. We have already run three of them, starting in December 2004. In the last production, in December 2005, we run as Geant4 VO, for the first time, demonstrating the full involvement of Geant4 inside the EGEE communities. Several EGEE sites have provided us with the needed CPU, and this has guaranteed the success of the production, arriving to an overall efficiency rate of about 99%. In the talk, emphasis will be given on our experiences in using the Grid, the results we got from it and possible future improvements. Technical aspects of the Grid framework that have been deployed for the production will only be mentioned; for more details see the talks of P.Mendez and J.Moscicki.
Speaker: Dr Alberto Ribon (CERN)
• 24
The ATLAS Rome Production Experience on the LHC Computing Grid
The Large Hadron Collider at CERN will start data acquisition in 2007. The ATLAS (A Toroidal LHC ApparatuS) experiment is preparing for the data handling and analysis via a series of Data Challenges and production exercises to validate its computing model and to provide useful samples of data for detector and physics studies. The last Data Challenge, begun in June 2004 and ended in early 2005, was the first performed completely in a Grid environment. Immediately afterwards, a new production activity was necessary in order to provide the event samples for the ATLAS physics workshop, taking place in June 2005 in Rome. This exercise offered a unique opportunity to estimate the reached improvements and to continue the validation of the computing model. In this contribution we discuss the experience of the “Rome production” on the LHC Computing Grid infrastructure, describing the achievements, the improvements with respect to the previous Data Challenge and the problems observed, together with the lessons learned and future plans.
Speaker: Dr Simone Campana (CERN/IT/PSS)
• 25
CRAB: a tool for CMS distributed analysis in grid environment.
The CMS experiment will produce a large amount of data (few PBytes each year) that will be distributed and stored in many computing centres spread in the countries participating to the CMS collaboration and made available for analysis to world-wide distributed physicists. CMS will use a distributed architecture based on grid infrastructure to analyze data stored at remote sites, to assure data access only to authorized users and to ensure remote resources availability. Data analisys in a distributed environment is a complex computing task, that assume to know which data are available, where data are stored and how to access them. The CMS collaboration is developing a user friendly tool, CRAB (Cms Remote Analysis Builder), whose aim is to simplify the work of final users to create and to submit analysis jobs into the grid environment. Its purpose is to allow generic users, without specific knowledge of grid infrastructure, to access and analyze remote data as easily as in a local environment, hiding the complexity of distributed computational services. Users have to develop their analisys code in an interactive environment and decide which data to analyze, providing to CRAB data parameters (keywords to select data and total number of events) and how to manage produced output (return file to UI or store into remote storage). CRAB creates a wrapper of the analisys executable which will be run on remote resources, including CMS environment setup and output management. CRAB splits the analisys into a number of jobs according to user provided information about number of events. The job submission is done using grid workload management command. The user executable is sent to remote resource via inputsandbox, together with the job. Data discovery, resources availability, status monitoring and output retrieval of submitted jobs are fully handled by CRAB. The tool is written in python and have to be installed to the User Interface, the user access point to the grid. Up to now CRAB is installed in ~45 UI and about ~210 different kind of data are available in ~40 remote sites. The weekly rate of submitted jobs is ~10000 with a success rate about 75%, that means jobs arrive to remote sites and produce outputs, while the remnant 25% aborts due to site setup problem or grid services failure. In this report we will explain how CRAB is interfaced with other CMS/grid services and will report the daily user's experience with this tool analyzing simulated data needed to prepare the Physics Technical Design Report.
• 1c: Earth Observation - Archaeology - Digital Library 40-SS-D01

40-SS-D01

CERN

• 2:00 PM
Introduction to the parallel session
• 26
Diligent and OpenDLib: long and short term exploitation of a gLite Grid Infrastructure
Speaker: Dr Davide Bernardini (CNR-ISTI)
• 27
Data Grid Services for National Digital Archives Program in Taiwan
Speaker: Mr Eric Yen (Academia SINICA Grid Computing Centre, Taiwan)
• 2:45 PM
Discussion
• 28
Project gridification: the UNOSAT experience
The EGEE infrastructure is a key part of the computing environment for the simulation, processing and analysis of the data of the Large Hadron Collider (LHC) experiments (ALICE, ATLAS, CMS and LHCb). The example of the LHC experiments illustrates well the motivation behind Grid technology. The LHC accelerator will start operation in 2007, and the total data volume per experiment is estimated to be a few PB/year at the beginning of the machine’s operations, leading to a total yearly production of several hundred PB for all four experiments around 2012. The processing of this data will require large computational, storage and associated human resources for operation and support. It was not considered feasible to fund all of the resources at one site, and so it was agreed that the LCG computing service would be implemented as a geographically distributed Computational Data Grid. This means, the service will use computational and storage resources, installed at a large number of computing sites in many different countries, interconnected by fast networks. At the moment, the EGEE infrastructure counts 160 sites, distributed over more than 30 countries. These sites hold 15000 CPUs and about 9PB of storage capability. The Grid middleware will hide much of the complexity of this environment from the user, organizing all the resources in a coherent virtual computer centre. The computational and storage capability of the Grid is attracting other research communities and we would like to discuss the general patterns observed in supporting new applications, porting their application onto the EGEE infrastructure. In this talk we present our experiences in the porting of three different applications inside the Grid like Geant4, UNOSAT and others. Geant4 is a toolkit for the Monte Carlo simulation of the interaction of particles with matter. It is applied to a wide field of research including high energy physics and nuclear experiments, medical, accelerator and space physics studies. ATLAS, CMS, LHCb, Babar, and HARP are actively using Geant4 in production. UNOSAT is a United Nations initiative to provide the humanitarian community with access to satellite imaginary and Geographic System services. UNOSAT is implemented by the UN Institute for Training and Research (UNITAR) and manager by the UN Office for Project Services (UNOPS). In addition, partners from public and private organizations constitute the UNOSAT consortium. Among these partners, CERN participates actively providing the computational and storage resources needed for their images analysis. During the gridification of the UNOSAT project, the collaboration with the developers of the ARDA group to adapt the AMGA software to the UNOSAT expectations was extremely important. The satellite images provided by UNOSAT have been stored in Storage Systems at CERN and registered inside the LCG Catalog (LFC). The files so registered have been identified with an easy to remember Logical File Name (LFN). The LFC Catalog is then able to map these LFN to the physical location of the files. Due to the UNOSAT infrastructure, their users will provide as input information the coordinates of each image. AMGA is able to map these coordinates (considered metadata information) to the corresponding LFN of the files registered inside the Grid. Then the LFC will find the physical location of the images. A successful model to guarantee a smooth and efficient entrance in the Grid environment is to identify an expert support to work with the new community. This person will assist them during the implementation and execution of their applications inside the Grid. He will also be the Virtual Organization (VO) contact person with the EGEE sites. This person will work together with the EGEE deployment team and with the responsible of the sites to set the services needed by the experiment or community, observing also the relevant security and access policies. Once these new communities attain a good level of maturity and confidence, a VO Manager would be identified in the users community. This talk will report a number of concrete examples and it will try to summarize the main lessons. We believe that this should be extremely interesting for new communities in order to early identify possible problems and prepare the appropriate solutions. In addition, this support scheme would also be very interesting as a model, for example, for local application support in EGEE II.
Speaker: Dr Patricia Mendez Lorenzo (CERN IT/PSS)
• 29
International Telecommunication Union Regional Radio Conference and the EGEE grid
Speaker: Dr Andrea Manara (ITU BR)
• 30
ArchaeoGRID, a GRID for Archaeology
Speaker: Prof. Pier Giovanni Pelfer (Dept. Physics, University of Florence and INFN, Italy)
• 3:45 PM
Discussion
• 4:00 PM
Coffee break
• 31
Worldwide ozone distribution by using Grid infrastructure
ESRIN : L. Fusco, J. Linford, C. Retscher IPSL : C. Boonne, S. Godin-Beekmann, M. Petitdidier, D. Weissenbach KNMI: W. Som de Cerff SCAI-FHG: J. Kraus, H. Schwichtenberg UTV : F. Del Frate, M. Iapaolo Satellite data processing presents a challenge for any computer resources due to the large volume of data and number of files. The vast amount of data sets and databases are all distributed among different countries and organizations. The investigation of such data is limited to some sub-sets. As a matter of fact, all those data cannot be explored completely due on one hand to the limitation in local computer and storage power, and on the other hand to the lack of tools adapted to handle, control and analyse efficiently so large sets of data. In order to check the capability of a Grid infrastructure to fill those requirements, an application based on ozone measurements was designed to be ported first on DataGrid, then on EGEE and local Grid in ESRIN. The satellite data are provided by the experiment, GOME aboard the satellite ERS. From the ozone vertical total content, ozone profiles have been retrieved by using two different algorithm schemas, one is based on an inversion protocol (KNMI), the other on a neural network approach (UTV). The porting on DataGrid was successful however some functionalities are missing to make the application operational. In EGEE, the reliability of the infrastructure has been as reliable as a local Grid. The second part of the application has been the validation of those satellite ozone profiles by profiles measured by ground-based lidars. The goal was to find out collocated observations meta databases were built to solve this problem. The result has been the production of the 7 years of data on EGEE and on local Grid at ESRIN with two versions of the Neural Network algorithm and several months by the inversion algorithm. It is an amount of around 100 000 files registered on EGEE. Then, the validation of this set of data was carried out by using all the lidar profiles available in the NDSC databases (Network Detection of Stratospheric Changes). To find collocation data an OGSA-DAI metadata server has been implemented and geospatial queries permit to search the orbit passing over the lidar site. The second work, started during DataGrid, has been the development of a portal, specific to the Ozone application, described above, and extended latter to other satellite data like Meris…The role of this portal is to provide an operational way to a friendly end-use of Grid infrastructure. It provides the missing functionalities of the Grid infrastructure. EGEE offers the possibility to store all the ozone data obtained by satellite experiment (GOME, GOMOS, MIPAS…) as well as ground-based network of lidars and radiosoundings… The next goal on the way is to be able to find out at a given location and/or at a given time the distribution of ozone by combining all the existing databases. In this presentation, the scientific and operational interest will be pointed out.
Speaker: Monique Petitdidier (IPSL)
• 32
On-line demonstration of Flood application at EGEE User Forum
The flood application has been successfully demonstrated at EGEE second review in December and we would demonstrate it at EGEE User forum for Grid application developers and Grid users. Flood application consists of several numerical models of meteorology, hydrology and hydraulics. A portal is developed for comfortable use of flood application. The portal has four main modules: • Workflow management module: for managing execution of tasks with data dependences • Data management module: allows users to search and download data from storage elements • Visualization module: show the output from models in several forms: text, picture, animation and virtual reality • Collaboration module: allows users to communicate with each other and cooperate on flood forecasting The demonstration will be done on GILDA demonstration testbed. Job execution in the Grid tested will be performed using gLite middleware. The aim of the demonstration is to show how to implement complicate grid applications with many models and support modules and also the FloodGrid portal, that allows users to run the application without knowledge about grid computing
Speaker: Dr Viet Tran (Institute of Informatics, Slovakia)
• 33
Solid Earth Physics on EGEE
Speaker: Geneviève Moguilny (Institut de Physique du Globe de Paris)
• 5:15 PM
Discussion
• 34
Expandig GEOsciences on DEmand
Speaker: Mr Gael Youinou (Unknown)
• 35
Requirements of Climate applications on Grid infrastructures; C3-Grid and EGEE
Speaker: Dr Joachim Biercamp (DKRZ)
• 6:00 PM
Discussion
• 1d: Computational Chemistry - Lattice QCD - Finance 40/4-C01

40/4-C01

CERN

30
Show room on map
• 2:00 PM
Introduction
• 36
Grid computation for Lattice QCD
This is the first use of the GRID structure to an expensive QCD lattice calculation performed under the VO theophys. It concerns the study on the lattice of the SU(3) Yang-Mills topological charge distribution, which is one of the most important non pertubative features of the theory. The first moment of the distribution is the topological susceptibility, which enters in the famous Witten Veneziano formula (See Luigi Del Debbio, Leonardo Giusti, Claudio Pica Phys.Rev.Lett.94:032003,2005 and references therein). The codes adopted in this project, are optimized to run with high efficiency on a single pc using the SSE2 feature of Intel and AMD processors to implement the performances. (L. Giusti, C. Hoelbling, M. Luscher, H. Wittig,Comput.Phys.Commun.153:31-51,2003) Different codes based on parallel structure are already being developed and tested. They need a band interconnection among nodes greater than 250 MBytes/s and we hope they can be sent to the GRID in the future. The first physical results of the project are planned to be presented at Lattice2006 international symposium at the end of July in Tucson by the collaboration (L. Del Debbio (Edinburgh), L. Giusti (Cern), S. Petrarca (univ. of Roma 1), B. Taglienti (INFN, Sez. of Roma1). The production on a "small" SU(3) lattice(12^4) at beta=6.0 is finished. The results are very encouraging. We started a new run on a 14^4 lattice whith the same physical volume. Although the statistics is yet unsufficient, the signal is confirmed. The total CPU time used from the beginning of the work (20-10-2005) up to now (26-01-2006) under the VO theophys turns out to be 70000 hours. Total number of job submitted is about 6500. Failures (approximately): 500 due to non-sse2 CPU. 1000 job aborted due to unknown reasons. A typical 12^4 job requires 220 MB of ram; all the production has been divided in small chunks requiring approximately 12 hours of CPU. (Longer jobs are prone to be aborted by the GRID system). Every job reads and writes 5.7MB from/to a storage element. The resouces needed by the typical 14^4 job are nearly a factor of 2 for CPU, ram and storage. We organized the production in 120 simultaneous jobs, and each job runs on a single processor. The job time length is chosen as a compromise between the job time limit actually imposed by the GRID system and the bookkeeping activity needed to acquire the result and start a new job.
Speaker: Dr Giuseppe Andronico (INFN SEZIONE DI CATANIA)
• 37
SALUTE – GRID Application for problems in quantum transport
Authors: E. Atanassov, T. Gurov, A. Karaivanova and M. Nedjalkov Department of Parallel Algorithms Institute for Parallel Processing - Bulgarian Academy of Sciences E-mails:{emanouil, gurov, anet, mixi}@parallel.bas.bg Abstract body: SALUTE (Stochastic ALgorithms for Ultra-fast Transport in sEmiconductors) is an MPI Grid application developed for solving computationally intensive problems in quantum transport. Monte Carlo (MC) methods for quantum transport in semiconductors and semiconductor devices have been actively developed during the last decade. If temporal or spatial scales become short, the evolution of the semiconductor carriers cannot be described in terms of the Boltzmann transport [1] and therefore a quantum description is needed. We note the importance of active investigations in this field: nowadays nanotechnology provides devices and structures where the carrier transport occurs at nanometer and femtosecond scales. As a rule quantum problems are very computationally intensive and require parallel and Grid implementations. SALUTE is a pilot grid application developed at the Department of Parallel Algorithms, Institute for Parallel Processing - BAS where the stochastic approach relies on the numerical MC theory applied to the integral form of the generalized electron-phonon Wigner equation. The Wigner equation for the nanometer and femtosecond transport regime is derived from a three equations set model based on the generalized Wigner function [2]. The full version of the equation poses serious numerical challenges. Two major formulations (for homogeneous and inhomogeneous cases) of the equation are studied using SALUTE. The physical model in the first formulation describes a femtosecond relaxation process of optically excited electrons which interact with phonons in one-band semiconductor [3]. The interaction with phonons is switched on after a laser pulse creates an initial electron distribution. Experimentally, such processes can be investigated by using ultra-fast spectroscopy, where the relaxation of electrons is explored during the first hundreds femtoseconds after the optical excitation. In our model we consider a low-density regime, where the interaction with phonons dominates the carrier-carrier interaction. In the second formulation we consider a highly non-equilibrium electron distribution which propagates in a quantum semiconductor wire [4]. The electrons, which can be initially injected or optically generated in the wire, begin to interact with three dimensional phonons. The evolution of such process is quantum, both, in the real space due to the confinements of the wire, and in the momentum space due to the early stage of the electron-phonon kinetics. A detailed description of the algorithms can be found in [5, 6, 7]. Monte Carlo applications are widely perceived as computationally intensive but naturally parallel. The subsequent growth of computer power, especially that of the parallel computers and distributed systems, made possible the development of distributed MC applications performing more and more ambitious calculations. Compared to the parallel computing environment, a large-scale distributed computing environment or a Computational Grid has tremendous amount of computational power. Let us mention the EGEE Grid which today consists of over 18900 CPU in 200 Grid sites. SALUTE solves an NP-hard problem concerning the evolution time. On the other hand, SALUTE consists of Monte Carlo algorithms which are inherently parallel. Thus, SALUTE is a very good candidate for implementations on MPI-enabled Grid sites. By using the Grid environment provided by the EGEE project middleware, we were able to reduce the computing time of Monte Carlo simulations of ultra-fast carrier transport in semiconductors. The simulations are parallelized on the Grid by splitting the underlying random number sequences. Successful tests of the application were performed at several Bulgarian and South East European EGEE GRID sites using the Resource Broker at IPP-BAS. The MPI version was MPICH 1.2.6, and the execution was performed on clusters using both pbs and lcgpbs jobmanagers, i.e. with shared or non-shared home directories. The test results show excellent parallel efficiency. Obtaining results for larger evolution times requires more computational power, which means that the application should run on larger sites or on several sites in parallel. The application can provide results for other types of semiconductors like Si or for composite materials. Figure 1. Distribution of optically generated electrons in a quantum wire. REFERENCES [1] J. Rammer, Quantum transport theory of electrons in solids: A single- particle approach, Reviews of Modern Physics, series 63 no 4, 781 - 817, 1991. [2] M. Nedjalkov, R. Kosik, H. Kosina, and S. Selberherr, A Wigner Equation for Nanometer and Femtosecond Transport Regime, In: Proceedings of the 2001 First IEEE Conference on Nanotechnology, (October, Maui, Hawaii), IEEE, 277-281, 2001. [3] T.V. Gurov, P.A. Whitlock, "An efficient backward Monte Carlo estimator for solving of a quantum kinetic equation with memory kernel", Mathematics and Computers in Simulation, Vol. 60, 85-105, 2002. [4] M. Nedjalkov, T. Gurov, H. Kosina, D. Vasileska. and V. Palankovski, Femtosecond Evolution of Spatially Inhomogeneous Carrier Excitations: Part I: Kinetic Approach, to appear in Lecture Notes in Computing Sciences, Springer-Verlag Berlin Heidelberg, Vol. 3743, (2006) [5] E. Atanassov, T. Gurov, A. Karaivanova, and M. Nedjalkov, SALUTE – an MPI GRID Application, in: Proceedings of the 28th International Convetion, MIPRO 2005, May 30-June 3, Opatija, Croatia, 259 - 262, 2005. [6] T.V. Gurov, M. Nedjalkov, P.A. Whitlock, H. Kosina and S. Selberherr, Femtosecond relaxation of hot electrons by phonon emission in presence of electric field, Physica B, vol 314, p. 301, 2002 [7] T.V. Gurov and I.T. Dimov, A Parallel Monte Carlo Method for Electron Quantum Kinetic Equation, LNCS, Vol. 2907, Springer-Verlag, 153—160, 2004
Speaker: Prof. Aneta Karaivanova (IPP-BAS)
• 2:45 PM
Discussion
• 38
The EGRID facility
Speaker: Dr Stefano Cozzini (CNR-INFM Democritos and ICTP)
• 3:15 PM
Discussion
• 39
The Molecular Science challenges in EGEE
Speaker: Osvaldo Gervasi (Department of Mathematics and Computer Science, University of Perugia)
• 40
On the development of a grid enabled a priori molecular simulator
We have implemented on the production grid of EGEE GEMS.0, a demo version of our Molecular processes simulator that deals with gas phase atom diatom bimolecular reactions. GEMS.0 takes the parameters of the potential from a data bank and carries out the dynamical calculations by running quasiclassical trajectories [1]. A generalization of GEMS.0 to include the calculation of ab initio potentials and the use of quantum dynamics is under way with the collaboration of the members of COMPCHEM [2]. In this communication we report on the implementation of quantum dynamics procedures. Quantum approaches require the integration of the Schroedinger equation to calculate the scattering matrix SJ (E). The integration of the Schroedinger equation can be carried out using either time dependent or time independent techniques. The structure of the computer code performing the propagation in time of the wavepacket (TIDEP)[3] for the Ncond sets of initial conditions is sketched in Fig. 1. Read input data: tfin, tstep, system data ... Do icond = 1,Ncond Read initial conditions: v, j, Etr, J ... Perform preliminary and first step calculations Do t = to, tfin, tstep Perform the time step propagation Perform the asymptotic analysis to update S Check for convergence of the results EndDo t EndDo icond Fig. 1. Pseudocode of the TIDEP wavepacket program kernel. The TIDEP kernel shows strict similarities with that of the trajectory one (ABCtraj) already implemented in GEMS.0. In fact, for a given set of initial conditions, the inner loop of TIDEP propagates recursively over time the wavepacket. The most noticeable difference between this and the trajectory integration is the fact that at each time step TIDEP performs a large number of matrix operations which increase memory and computing time requests of some orders of magnitude. The structure of the time independent suite of codes [4] is, instead, articulated in a different way. It is in fact made of a first block (ABM) [4] that generates the local basis set and builds the coupling matrix (the integration bed) using also the basis set of the previous sector. This calculation has been decoupled by repeating for each sector the calculation of the basis set of the previous one (see Fig. 2). This allows to distribute the calculations on the grid. The second block is concerned with the propagation of the solution R matrix from small to large values of the hyperradius performed by the program LOGDER [4]. For this block, again, the same scheme of ABCtraj can be adopted to distribute the propagation of the R matrix at given values of E and J as shown in Fig. 3. Read input data: in, fin, step, J, Emax, ... Perform preliminary calculations Do (rho) = (rho)in + (rho)step, (rho)fin, (rho)step Calculate eigenvalues and surface functions for present and previous (rho) Build intersector mapping and intrasector coupling matrices EndDo (rho) Fig. 2. Pseudocode of the ABM program kernel. Read input data: in, fin, step, ... Transfer the coupling matrices generated by ABM from disk Do icond = 1,Ncond Read input data: J, E ... Perform preliminary calculations Do (rho) = (rho)in, (rho)fin, (rho)step Perform the single sector propagation of the R matrix EndDo (rho) EndDo icond Fig. 3. Pseudocode of the LOGDER program kernel. References 1. Gervasi, O., Dittamo, C., Lagana', A.: Lecture Notes in Computer Science 3470, 16-22 (2005). 2. EGEE-COMPCHEM Memorandum of understanding, March 2005 3. Gregori, S., Tasso, S., Lagana', A: Lecture Notes in Computer Science 3044, 437- 444 (2004). 4. Bolloni, A., Crocchianti, S., Lagana', A.: Lecture Notes in Computer Science 1908, 338-345 (2000).
Speaker: Antonio Lagana` (1Department of Chemistry, University of Perugia)
• 4:00 PM
Coffee break
• 41
An Attempt at Applying EGEE Grid to Quantum Chemistry
The EGEE Grid Project enables access to huge computing and storage resources. Taking this oportunity we have tried to identyfie chemical problems that could be computed in this environment. Some of the results considered within this work are presented with description focused on requirements for the computational enviroment as well as techniques of Grid-enabling computations based on packages like GAMESS and GAUSIAN. Recently lots of work has been done in the area of parallelizing the existing codes and discovering new ones used in quantum chemistry. That allows calculations to run much faster now than even ten years ago. However, there still exist tasks where without a large number of processors it is not possible to obtain satisfactory results. The two main challenges are harmonic frequency calculations and ab-initio (AI) molecular dynamics (MD) simulations. The former ones are mainly used to analyze molecular vibrations. Despite the fact that the algorithm for analytic harmonic frequency calculations has been known for over 20 years, only few quantum chemical codes have it implemented. The other still use numerical scheme where for a given number of atoms (N) in a molecule, , and for more accurate calculations independent steps (energy + gradients) have to be done to get harmonic frequencies. To achieve this as many processors as possible is needed to fit that huge number of calculations. This makes grids technology an ideal solution for that kind of application. The second challenge, MD simulations are mainly used in a case where ’static’ calculation like for example determination of Nuclear Magnetic Resonance (NMR) chemical shifts gives wrong results. MD consists usually of two steps. In the first one the nuclear gradients are calculated, in the second one, based on obtained gradients, the actual classical forces acting on an atom are calculated. Knowing these forces one can estimate accelerations, velocities and guess new position of the atom after a given short period of time (so called time step). Finally the whole process is repeated for every new position of each atom. In case of mentioned NMR experiment we are interested in the average value of chemical shift over simulation. Of course NMR calculations are also very time consuming themselves and have to be done for many different geometries which again makes grid technology an ideal solution to final NMR chemical shift calculations. We present here two kinds of calculations. First we show results for geometry optimization and frequency calculations for a few carotenoids. These molecules are of almost constant interest since they cooperate with chlorophyll in photosynthesis process. All the calculations have been done within EGEE Grid (VOCE VO). We also present an example of MD calculations and share our knowledge about what kind of problems can be found during such studies.
Speaker: Dr Mariusz Sterzel (Academic Computer Centre "Cyfronet")
• 4:45 PM
Discussion
• Poster and Demo session + cocktail: Demo and poster session
• 42
An efficient method for fine-grained access authorization in distributed (Grid) storage systems
Speaker: Andreas Peters (CERN)
• 43
Application Identification and Support in BalticGRID
Introduction The Baltic Grid project, a FP6 program, involving 10 leading institutions in six countries, started in November 2005. Its aims to i) develop and integrate the research and education computing and communication infrastructure in the Baltic States into the emerging European Grid infrastructure, ii) bring the knowledge in Grid technologies and use of Grids in the Baltic States to a level comparable to that in EU members states, and iii) further engage the Baltic States in policy and standards setting activities. The integration of Baltic States into the European Grid infrastructure is primarily focusing on extending the EGEE (with which four partners are already engaged) to the Baltic States. The Baltic Grid takes advantage of the local existing e-infrastructures in the region. The Baltic Grid project is of high strategic importance for the Baltic States and it is designed to give a rapid build-up of a Grid infrastructure, contributing to the enabling of the new member states participation in the European Research Area. One of the most important steps in Baltic Grid development is application identification and support. This activity will be carried out through three tasks. Pilot Applications Baltic Grid intends to initiate three pilot applications for validation and for demonstration of successful scientific use. High-energy physics application includes statistical data analysis, production of Monte Carlo samples and distributed data analysis, nuclear and sub-nuclear physics, condensed matter physics and many-body problems. It will be implemented because of the critical importance of Grids to this community and its relative maturity. Material sciences application presents research areas, having substantial number of potential Grid users among scientists in Baltic states. It includes tools for establishing the geometrical structure of various organic, metal-organic and inorganic materials; understanding optical and magnetic properties of molecular derivatives; predicting new technology and creation of new materials with specified characteristics. Modelling and simulation of heterogeneous processes in chemistry, biochemistry, geochemistry, electrochemistry, biology, engineering will be implemented because of MS strategic importance to the Baltic States and substantial computing needs. A bioinformatics application will be implemented to give tools and computing procedures for sequence pattern discovery and the gene regulatory network reconstruction, inference of haplotype structure and pharmacogenetics related association, studies, modelling and exploration of mechanism of enzymatic catalysis, de novo design of proteins, quantum-mechanical investigations of organic molecules and their applications, for the refinement of 3D biological macromolecule models against X-ray diffraction or NMR data, for modeling of biosensors and other reaction- diffusion processes. This application intends also to support the collaborative efforts of scientists in the Baltic States in this highly distributed community with needs to share data from many sources and a diverse set of tools. Special Interest Groups The task of special interest groups (SIG) aims to improve communication among many separate research groups, having similar or related R&D interests. The development and implementation of SIGs is a relatively new idea in grid computing infrastructure based on semantics representation methods and tools and leading to enhancement of services and applications with knowledge and semantics. Research areas under consideration for SIG development and implementation are: modelling of the Baltic Sea eco-system (together with BOOS – a future operational oceanographic service to the marine industry in the Baltic region), hydrodynamic environmental models for sustainable development of the Baltic Sea coastal zone, environmental impact assessment and environmental processes modeling, life sciences and medicine. Application Adaptation Support This is a specific activity aiming to organize and initiate communication between application experts and Grid experts facilitating rapid Grid adaptation and deployment of applications through formation of an Application Expert Group. This group will analyze applications and identify required Grid technologies and provide consulting services to application developers. The services will include assistance with integration with the Migrating Desktop to enable GUI-based access to the BG infrastructure and services, ensuring interoperability with the BG middleware. Performance studies to find bottle necks of the deployed applications may be carried out if needed using tools for performance evaluation, like G-PM and OCM-G, developed in CrossGrid Project.
Speaker: Dr Algimantas Juozapavicius (associate professor)
• 44
Applications integrated on the GILDA's testbed.
Speakers: Dr Antonio Calanducci (INFN Sez. Catania - Italy), Dr Giuseppe La Rocca (INFN Sez. Catania - Italy)
• 45
CMS Dashboard of Grid Activity
The CMS Dashboard project aims to provide a single entry point to the monitoring data collected from the CMS distributed computing system. The monitoring information collected in the CMS dashboard allows to follow the processing of the CMS jobs on the LCG, EGEE and OSG grid infrastructures. The Dashboard supports tracing of the job execution failures on the Grid and erros due to problems with the experiment-specific applications. In addition the Dashboard is able to present an estimation of the I/O rates between the worker nodes and data storage and helps keeping record of the sharing of the resources between production and analysis groups and different users. One of the final goals is to discover inefficiencies in the data distribution and problems in the data publishing. The Dashboard data base combines the Grid-specific data from the Logging and Book-keeping system via RGMA and the CMS-specific data via Monalisa monitoring system. Web interface to the dashboard data base provides access to the monitoring data in the interactive mode and through the set of the predefined views. The interactive mode enables the possibility to get information in a detailed level, which is very important for tracking of various problems.
Speaker: Mr Juha HERRALA (CERN)
• 46
Demo: LHCb data analysis using Ganga
The ARDA-LHCb prototype activity is focusing on the GANGA system (a joint ATLAS-LHCb project). The main idea behind GANGA is that the physicists should have a simple interface to their analysis programs. GANGA allows preparing the application, to organize the submission and gather results via a clean Python API. The details needed to submit a job on the Grid (like special configuration files) are factorised out and applied transparently by the system. In other words, it is possible to set up an application on a portable PC, then run some higher-statistics tests on a local facility (like LSF at CERN) and finally analyse all the available statistics on the Grid just changing the parameter which identifies the execution back-end.
Speaker: Andrew Maier (CERN)
• 47
Speaker: Prof. Peter Kacsuk (MTA SZTAKI)
• 48
gLite Service Discovery for users and applications
In order to make use of the resources of a grid, to submit a job or query information for example, a user must contact a service that provides the capability, usually via a URL. Grid services themselves must often contact other services to do their work. In order to locate services, some kind of dynamic service directory is required and there exist several grid information systems, such as R-GMA and BDII, that can provide this service. However each information system has its own unique interface, so JRA1 have developed a standard Service Discovery API to hide these differences from applications that simply want to locate services that meet their criteria. The gLite Service Discovery API provides a standard interface to access service details published by information systems. There are four methods available for discovering services, these are: listServices, listAssociatedServices, listServicesByData and listServicesByHost. These all take a range of arguments for narrowing the search and all return a list of service structures. Once you have found a service it is then possible to use other methods to obtain more detailed information about it (using its unique id). These methods are: getService, getServiceDetails, getServiceData, getServiceDataItem, getServiceSite and getServiceWSDL. The gLite Service Discovery API provides interfaces for the Java and C/C++ programming languages and a command line tool (glite-sd-query). It uses plugins for the R-GMA and BDII information systems, and for retrieving the information from an XML file. Other plugins (e.g. UDDI) could be developed if needed. JRA1 also provide a service tool, rgma-servicetool, to allow any service running on a host to easily publish service data via R-GMA. All a service has to do is to provide a description file that contains static information about itself and the name of a command to call, plus any required parameters, in order to obtain the current state of the service. This information is then published via R-GMA to a number of tables that conform to the GLUE specification. The data published to these tables are used by the R-GMA gLite Service Discovery implementation. Any service, including VO services, can make use of rgma-servicetool. The existing system assumes that the underlying information system has been correctly configured. In the case of R-GMA this means that the client needs to know the local R-GMA server (sometimes known as a "Mon box"). A user coming to an unknown environment with a laptop needs to first find the information system before interacting with it. This is the well-known bootstrapping problem that can be solved by IP multicast techniques. We will provide discovery of local services without making use of existing information systems and with near-zero configuration. Clients send a multicast query to a multicast group and services that satisfy the query respond directly to the client using unicast. This capability will initially be added to R-GMA services. Once this has been done it will be possible to introduce additional R-GMA servers at a site, for example to take increased load, without the need to reconfigure any clients. The existing SD API with the R-GMA plugin will immediately benefit from the new server. Subsequently this component, suitably packaged, will be made available to other gLite services. The combination of the rgma-servicetool and the gLite Service Discovery makes it simple for any service to make itself known and then for user and high-level applications to find these services. In addition once the bootstrapping code is developed and added to R-GMA, the configuration of R-GMA, and thereby SD with the R-GMA plugin, will become trivial.
Speaker: Mr John Walk (RAL)
• 49
HGSM Web Application
This is a web application that serves as a front-end to the database that keeps information about the grid sites (clusters), their admins, email and phone contacts, other contact people, site nodes and resources, downtimes etc. These sites are organized by country and countries are organized by regions. The admins of each site can also update the information about the site.
Speaker: Mr Dashamir Hoxha (Institute of Informatics and Applied Informatics (INIMA), Tirana, Albania)
• 50
Internal Virtual Organizations in the RDIG-EGEE Consortium
In the beginning of 2005 the formal procedures and the proper administrative structures for creation and registration of the internal RDIG-EGEE virtual organizations were established in the Russian Data Intensive Grid (RDIG) consortium. The Service Center of Registration of the Virtual Organizations is accessible through the URL: http://rdig-registrar.sinp.msu.ru/newVO.html . All the documents and rules, the basic document, in particular - “Creation and Registration of Virtual Organizations in the frames of the RDIG-EGEE: Rules and Procedure” (in Russian), and the Questionnaire examples can be found there (http://rdig- registrar.sinp.msu.ru/VOdocs/newVOinRDIG.html). The Council on RDIG-EGEE extension has been formed. The Council inspects all the new requests for new virtual organizations to be created. The aim of the creation of the RDIG-EGEE virtual organizations is to serve the national scientific projects and to test new application areas prior to including them into the global EGEE infrastructure. Nowadays we have 6 RDIG-EGEE internal virtual organizations with 42 members in them. Brief information on the Fusion VO for ITER project activities in Russia, eEarth VO for geophysics and cosmic research tasks (http://www.e-earth.ru/), and PHOTON VO for PHOTON and SELEX experiments (http://egee.itep.ru/PHOTON/index29d5en.html) is presented in poster.
Speaker: Dr Elena Tikhonenko (Joint Institute for Nuclear Research (JINR))
• 51
MEDIGRID: Mediterranean Grid of Multi-risk data and Models
Speaker: Dr Ladislav Hluchy (Institute of Informatics, Slovakia)
• 52
Meteorology and Space Weather Data Mining Portal
We will demonstrate an environmental data mining project Environmental Scenario Search Engine (ESSE) including a secure web application portal for interactive searching for events over a grid of environmental data access and mining web services hosted by OGSA-DAI containers. The web services are grid proxies for the database clusters with terabytes of high-resolution meteorological and space weather reanalysis data over the past 20-50 years. The data mining is based on fuzzy logic to make it possible to describe the searching events in natural language terms, such as “very cold day”. The ESSE portal allows parallel data mining across disciplines for correlated events in space, atmosphere and ocean. The ESSE data web-services are installed in the USA, Russia, South Africa, Australia, Japan, and China. The EGEE infrastructure facilitates sharing of the environmental data and grid services with the European environmental sciences community. The work is done in cooperation with the National Geophysical Data Center NOAA and supported by the grant from the Microsoft Research Ltd.
Speakers: Mr Alexey Poyda (Moscow State University), Mr Dmitry Mishin (Institute of Physics of the Earth Russian Acad. Sci.), Dr Mikhail Zhizhin (Geophysical Center Russian Acad. Sci.)
• 53
Migrating Desktop - graphical front-end to grid - On-line Demonstration
Speakers: Marcin Plociennik (PSNC), Pawel Wolniewicz (PSNC)
• 54
Parametric study workflow support by P-GRADE portal and MOTEUR workflow enactor
Speaker: Mr Gergely Sipos (MTA SZTAKI)
• 55
Replication on the AMGA Metadata Catalogue
Speaker: Nuno Filipe De Sousa Santos (Universidade de Coimbra)
• 56
Scientific data audification within GRID: from Etna volcano seismograms to text sonification
Data audification is the representation of data by sound signals; it can be considered as the acoustic counterpart of data graphic visualization, a mathematical mapping of information from data sets to sounds. Data audification is currently used in several fields, for different purposes: science and engineering, education and training, in most of the cases to provide a quick and effective data analysis and interpretation tool. Although most data analysis techniques are exclusively visual in nature (i.e. are based on the possibility of looking at graphical representations), data presentation and exploration systems could benefit greatly from the addition of sonification capacities. In addition to that, sonic representations are particularly useful when dealing with complex, high-dimensional data, or in data monitoring tasks where it is practically impossible to use the visual inspection. More interesting and intriguing aspects of data sonification concern the possibility of describing patterns or trends, through sound, which were hardly perceivable otherwise. Two examples, in particular, will be discussed in this paper, the first one coming from the world of geophysics and the second one from linguistics.
Speaker: Domenico Vicinanza (Univ. of Salerno + INFN Catania)
• 57
Secured Medical Data Management on the EGEE grid
Speaker: Dr Johan Montagnat (CNRS)
• 58
Sustainable management of groundwater exploitation using Monte Carlo simulation of seawater intrusion in the Korba aquifer (Tunisia)
Worldwide, seawater intrusion and salinisation of coastal aquifers and soils is a major threat for food production. While the physico-chemical processes triggering the transport and accumulation of salts in these regions are relatively well known and well described by a set of partial differential equations, often it is extremely difficult to model accurately these phenomena because of the lack of an accurate data set. On one hand the physical parameters (porosity, permeability, dispersivity) that control groundwater flow are extremely variable in space within geological media and are only measured at some specific locations, on the other hand the forcing terms (pumping, precipitation, etc.) are often not measured directly in the field. The result is a high level of uncertainty. The problem is how to take rational decision toward sustainable water management in such a context ? One possibility explored within this work is to run a large set of model simulations with stochastic parameters by means of the EGEE GRID infrastructure and to define robust and sustainable water management decisions based on probabilistic analysis of the resulting simulation outputs. This approach is currently being investigated in the Cape Bon peninsula, located 50 km South-East of Tunis, one of the most productive agricultural areas in Tunisia. In this plain the World Bank has shown that major water resources problem could occur in the next decade. One of the major sources of uncertainty in the Cap Bon aquifer system are the pumping rates and their time evolution. To investigate the impact of this source of uncertainty, first a geostatistical model of the spatial distribution of the pumping has been constructed and then the GRID has been used to run a 3D density-dependent groundwater flow and salt transport model in a Monte Carlo framework. While these results are still preliminary, GRID computing paradigm offers clearly a huge potential within this field. One particularly interesting aspect offered by this methodology to Tunisian water managers, not having access to local computing technology, is to be able in a near future to run directly, via a web portal to the GRID, their groundwater flow simulation and uncertainty analysis. This option has not been tested yet and requires further development.
Speaker: Mr Jawher Kerrou (University of Neuchatel)
• 59
VirtualGILDA: a virtual t-infrastructure for system administrator tutorials
In the Grid dissemination activity, teaching of Grid elements installation covers a very important role. While in tutorials for users availability of accounts and certificates is enough, in those ones for administrators a certain number of free machines is needed, and the requirements for a Grid-middleware compliant operating system also occurs. The VirtualGILDA infrastructure for training aims at offering a set of Virtual Machine (VM), hosted in Catania and based on VMWare technology, with a pre-installed OS and net connectivity: in this way tutors have all the needed machines ready to use. They only need a reliable access to the Internet. The presence of pre-installed Grid element is also possible, in order to provide tutors with a set of preconfigured machines ready to interact with elements that will be installed during the tutorial. The use of VMWare technology is also suitable for on site tutorials, to avoid problems deriving from the wide range of machine and OS type available on each training site. Using VMs the only requirement is the presence of machines that can run VMPlayer , i.e. Linux or Windows hosts.
Speaker: Roberto Barbera (INFN Catania)
• 60
VOCE - Central European Production Grid Service
This contribution describes a grid environment of the Virtual Organization for Central Europe (VOCE). VOCE infrastructure currently consists of computational resources and storage capacities provided by Central European resource owners. Unlike majority of other virtual organizations VOCE tends to be generic VO providing application neutral environment especially suitable for Grid newcomers allowing them to get quickly first experience with Grid computing and to test and evaluate Grid environment towards their specific application needs. VOCE facilities currently provide base for Central European t-infrastructure. The main goal of VOCE is to assist in adapting a software for use on a fully production Grid, not within a closed "teaching" environment, even for applications that do not have any Grid / cluster /remote computing experience. The VOCE application neutrality can be seen as an important feature that allows to provide an environment where different application requirements meet and expectations are to be fulfilled. All technical aspects related to the supported middleware (LCG, gLite), computing environments (MPI support), specific user interface support (Charon and P-GRADE portal) will be discussed and preliminary users experiences evaluated.
Speaker: Jan Kmunicek (CESNET)
• Thursday, March 2
• User Forum Plenary 2 500/1-001 - Main Auditorium

500/1-001 - Main Auditorium

CERN

400
Show room on map
• 61
The EGEE infrastructure
Speaker: Ian Bird (CERN)
• 10:30 AM
Coffee break
• 62
gLite status and plans
Speaker: Claudio Grandi (INFN Bologna)
• 12:30 PM
Lunch
• 2a: Workload management and Workflows 40-SS-C01

40-SS-C01

CERN

• 63
Logging and Bookkeeping and Job Provenance services
Logging and Bookkeeping (LB) service is responsible for keeping track of jobs within a complex Grid environment. Without such a service, users are unable to find out what happened with their lost jobs and Grid administrators are not able to improve the infrastructure. The LB service developed within the EGEE project provides a distributed scalable solution able to deal with hundreds thousands of jobs on large Grids. However, to provide the necessary scalability and not to slow down the processing of jobs within a middleware, it is based on a non-blocking asynchronous model. This means that the order of events sent to LB by individual parts of the middleware (user interface, scheduler, computing element, ...) is not guaranteed. While dealing with such out of order events, the LB may provide information that looks inconsistent with the knowledge user has from some other source (e.g. he got independent notification about the job state). The lecture will reveal LB internal design and we will discuss how the LB results (i.e. the job state) should be interpreted. While LB is dealing with active jobs only, Job Provenance (JP) is designed to store indefinitely information about all jobs that run on a Grid. All the relevant information needed to re-submit the job in the same environment is stored, including computing environment specification. Users can annotate stored records, providing yet another metadata layer useful e.g. for job grouping and data mining over the JP. We will provide basic information about the JP and its use, looking for a feedback for its improvement.
Speaker: Prof. Ludek Matyska (CESNET, z.s.p.o.)
• 64
Speaker: Francesco Giacomini (Istituto Nazionale di Fisica Nucleare (INFN))
• 65
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS (Batch Object Submission System) has been developed in the context of the CMS experiment to provide logging and bookkeeping and real-time monitoring of jobs submitted to a local farm or a grid system. The information is persistently stored in a relational database (right now MySQL or SQLite) for further processing. In this way the information that was available in the log file in a free form is structured in a fixed-form that allows easy and efficient access. The database is local to the user environment and is not requested to provide server capabilities to the external world: the only component that interacts with it is the BOSS client process. BOSS can log not only the typical information provided by the batch systems (e.g. executable name, time of submission and execution, return status, etc…), but also information specific to the job that is being executed (e.g. dataset that is being produced or analyzed, number of events done so far, number of events to be done, etc…). This is done by means of user-supplied filters: BOSS extracts the specific user-program information to be logged from the standard streams of the job itself filling up a fixed form journal file to be retrieved and processed at the end of job running via the BOSS client process. BOSS interfaces to a local or grid scheduler (e.g. LSF, PBS, Condor, LCG, etc…) through a set of scripts provided by the system administrator, using a predefined syntax. This allow hiding to the upper layers its implementation details, in particular whether the batch system is local or distributed. The interface provides the capability to register, un-register and list the schedulers. BOSS provides an interface to the local scheduler for the operations of job submission, deletion, querying and output retrieval. At output retrieval time the information in the database is updated using information sent back with the job. BOSS provides also an optional run-time monitoring system that, working in parallel to the logging system, collects information while the computational program is still running, and presents it to the upper layers through the same interface. The real-time information sent by the running jobs are collected in a separate database server, the same real-time database server may support more than one BOSS database. The information in the real-time database server has a limited lifetime: in general it is deleted after that the user has accessed it, and in any case after successful retrieval of the journal file. It is not possible to use the information in the real-time database server to update the logging information in the BOSS database once the journal file for the related job has been processed. The run-time monitoring is made through a pair client-updater registered as a plug-in module: they are the only components that interact with the real time database. The real-time updater is a client of the real-time database server: it sends the information of the journal file to the server at pre-defined intervals of time. The real-time client is a tool used by BOSS to update his database using the real-time information. The interface with the user is made through: a command line , kept as similar as possible to the one of the previous versions; it is the minimal way to access BOSS functionalities to give a straightforward test and training instrument; C++ API, increasing functionalities and ease-to-use for programs using BOSS: currently it is under development and is meant to grown-up with the users requirements; Python API, giving almost the same functionalities of the C++ one, plus the possibility to run BOSS from a python command line. User programs may be chained together to be executed by a single batch unit (job). The relational structure supports not only multiple programs per job (program chains) but also multiple jobs per chain (in the event of job resubmission). Homogeneous jobs, or better "chains of programs", may be grouped together in tasks (e.g. as a consequence of the splitting of a single processing chain into many processing chains that may run in parallel). The description of a task is passed to BOSS through an XML file, since it can model its hierarchical structure in a natural way. The process submitted to the batch scheduler is the BOSS job wrapper. All interactions of the batch scheduler to the user process pass through the BOSS wrapper. The BOSS job wrapper starts the chosen chaining tool, and optionally the real-time updater. An internal tool for chaining programs linearly is implemented in BOSS but in future external chaining tools may be registered to BOSS so that more complex chaining rules may be requested by the users. BOSS will not need to know how they work and will just pass any configuration information transparently down to them. The chaining tool starts a BOSS “program wrapper” for each user program.The program wrapper starts all processes needed to get the run-time information from the user programs into the journal file. This program wrapper is unique and it has to be started passing only one parameter, the program id. The BOSS client determines finished jobs by a query to the scheduler. It retrieves the output for those jobs and uses the information in the journal file to update the BOSS database. The BOSS client pops the information about running jobs from the real-time database server through the client part of the registered Real Time Monitor. It also deletes from the server the information concerning jobs for which the BOSS database has already been updated using the journal file. The information extracted from the real-time database server may be used to update the local BOSS database or just to show the latest status to the user.
Speaker: Giuseppe Codispoti (Universita di Bologna)
• 66
MOTEUR: a data intensive service-based workflow engine enactor
Speaker: Tristan Glatard (CNRS)
• 4:00 PM
Coffee break
• 67
K-Wf Grid: Knowledge-based Workflows in Grid
Speaker: Ladislav Hluchy (Institute of Informatics, Slovakia)
• 68
G-PBox: A framework for grid policy management
Sharing computing and storage resources among multiple Virtual Organizations which group people from different institutions often spanning many countries, requires a comprehensive policy management framework. This paper introduces G-PBox, a tool for the management of policies which integrates with other VO-based tools like VOMS, an attribute authority and DGAS an accounting system, to provide a framework for writing, administering and utilizing policies in a Grid environment.
Speaker: Mr Andrea Caltroni (INFN)
• 69
Title: "IBM strategic directions in workload virtualization"
"Workload virtualization is made of several disciplines: job/workflow scheduling, workload management, and provisioning. Much work has been spent so far on these various components in isolation. A better synergistic integration of these components allowing their interoperability towards an optimized resource allocation in order to satisfy user specified service level objectives is necessary. Other challenges in the grid space deal with being able to allow meta-scheduling and adaptive/dynamic workflow scheduling. In this talk, we present IBM strategic directions in the workload virtualization area. We also briefly introduce our current product portfolio in that space and describe how it may evolve over time, based on customer requirements and additional business value their satisfaction could provide them."
Speaker: Dr Jean-Pierre Prost (IBM Montpellier)
• 2b: Data access on the grid 40-SS-D01

40-SS-D01

CERN

• 70
GDSE: A new data source oriented computing element for Grid
Speaker: Dr Giuliano Taffoni (INAF - SI)
• 71
Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface
Speaker: Mr Thomas Doherty (University of Glasgow)
• 72
We present the ARDA Metadata Grid Application (AMGA) which is part of the gLite middleware. AMGA provides a lightweight service to manage, store and retrieve simple relational data on the grid, termed metadata. In this presentation we will first give an overview of AMGA's design, functionality, implementation and security features. AMGA was designed in close collaborations with the different EGEE user communities and combines high performance, which was very important to the high energy physics community, with fine-grained access restrictions required in particular by the BioMedical community. These access restrictions also make full use of the EGEE VOMS services and are based on grid certificates. To show to what extent the users' requirements have been met, we will present performance measurements as well as show uses-cases for the security features. Several applications are currently using AMGA to store their metadata. Among them are the MDM (Medical Data Manager) application implemented by the BioMedical community, the GANGA physics analysis tool from the Atlas and LHCb experimens and a Digital Library from the generic applications. The MDM application uses AMGA to store relational information on medical images stored on the grid plus information on patients and doctors in several tables. User applications can retrieve images baded no their metadata for further processing. Access restrictions are of the highest importance to the MDM application because the stored data is highly confidential. MDM therefore makes use of the fine-grained access restrictions of AMGA. The GANGA application uses AMGA to store the status information of jobs running on the grid which can be controlled by GANGA. AMGA's simple relational database features are mainly used to ensure consistency when several GANGA clients of the same user are accessing the stored information remotely. Finally, the Digital Library project makes similar use of AMGA as the MDM application but provides many different schemas to store not only images but information on texts, movies or music. Another difference is that there is only a central librarian updating the library while for MDM updates are triggered by the many image acquisition systems themselves. This presenation will also discuss future developments of AMGA, in particular its features to replicate or federate metadata. They will mainly allow users to make use of a better scaling behaviour but could also allow better security by using federation to physically seperate metadata. The replication features will be compared to current proprietary solutions. AMGA provides a very lightweight metadata service as well as basic database access functionality on the Grid. After a brief overview of AMGA's design, functionality, implementation and security features we will show performance comparisons of AMGA with direct database access as well as other Grid catalogue services. Finally the replication features of AMGA are presented and a comparison done with proprietary database replication solutions.
Speaker: Dr Birger Koblitz (CERN-IT)
• 73
Use of Oracle software in the CERN Grid
Oracle is known as a database vendor, but has much more to offer than data storage solutions. Some key Oracle products that are in use or are being currently full-scale tested at CERN will be discussed in this talk. It will primarily be an open discussion and interactive feedback from the audience is more than welcome The following topics will be discussed: Oracle Client Software distribution How can a large to huge number of systems be given easy possibility to connect to Oracle database servers; what are the distribution rights and how is it actually distributed and configured. Oracle Support for Linux Oracle officially supports those Linux distributions that are in widespread use and strongly recommends that servers are being run on supported distributions. This does however not imply, that other Linux distributions cannot at all be used. This talk will elaborate on this. Oracle Streams Replication The various possibilities for using Oracle Streams to replication large amounts of data will be discussed.
Speaker: Bjorn Engsig (ORACLE)
• 74
Discussion
• 4:00 PM
break
• 75
The gLite File Transfer Service
• 76
Encrypted Data Storage in EGEE
Speaker: Ákos Frohner (CERN)
• 77
Use of the Storage Resource Manager Interface
SRM v2.1 features and status ---------------------------- Version 2.1 of the Storage Resource Manager interface offers various features that are desired by EGEE VOs, particularly HEP experiments: pinning and unpinning of files, relative paths, (VOMS) ACL support, directory operations, global space reservation. The features are described in the context of actual use cases and availability in the following widely used SRM implementations: CASTOR, dCache, DPM. The interoperability of the different implementations and SRM versions is discussed, along with the absence of desirable features like quotas. Version 1.1 of the SRM standard is in widespread use, but has various deficiencies that are addressed to a certain extent by version 2.1. The two versions are incompatible, necessitating clients and servers to maintain both interfaces, at least for a while. Certain problems will only be dealt with in version 3, whose definition may not be completed for many months. There are various implementations of versions 1 and 2, developed by different collaborations for different user communities and service providers, with different requirements and priorities. In general a VO will have inhomogeneous storage resources, but a common SRM standard should make them compatible, such that data management tools and procedures need not bother with the actual types of the storage facilities.
Speaker: Maarten Litmaath (CERN)
• 78
Discussion
Discussion on grid data management
• 79
Space Physics Interactive Data Resource - SPIDR
Speakers: Mr Dmitry Mishin (Institute of Physics of the Earth Russian Acad. Sci.), Dr Mikhail Zhizhin (Geophysical Center Russian Acad. Sci.)
• 80
gLibrary: a Multimedia Contents Management System on the grid
Speaker: Dr Tony Calanducci (INFN Catania)
• 81
Discussion
Discussion on application data management
• 2c: Special type of jobs (MPI, SDJ, interactive jobs, ...) - Information systems 40/4-C01

40/4-C01

CERN

30
Show room on map
• 82
Scheduling Interactive Jobs
1.Introduction In the 70s, the transition from batch systems to interactive computing has been the enabling tool for the widespread diffusion of advances in IC technology. Grids are facing the same challenge. The exponential coefficients in network performance enable the virtualization and pooling of processors and storage; large- scale user involvement might require seamless integration of the grid power into everyday use. In this paper,interaction is a short name for all situations of display-action loop, ranging from a code-test-debug process in plain ascii, to computational steering through virtual/augmented reality interfaces, as well as portal access to grid resources, or complex and partially local workflows. At various levels, EGEE HEP and biomedical communities provide examples of the requirements of a turnaround time at the human scale. Section 2 will provide experimental evidence on this fact. Virtual machines provide a powerful new layer of abstraction in distributed computing environments. The freedom of scheduling and even migrating an entire OS and associated computations considerably eases the coexistence of deadline bound short jobs and long running batch jobs. The EGEE execution model is not based on such virtual machines, thus the scheduling issues must be addressed through the standard middleware components, broker and local schedulers. Section 3 and 4 will demonstrate that QoS and fast turnaround time are indeed feasible within these constraints. 2. EGEE usage The current use of EGEE makes a strong case for a specific support for short jobs. Through the analysis of the LB log of a broker, we can propose quantitative data to support this affirmation. The broker logged is grid09.lal.in2p3.fr, running successive versions of LCG; the trace covers one year (October 2004 to October 2005), with 66 distinct users and more than 90000 successful jobs, all production. This trace provides both the job intrinsic execution time $t$ (evaluated as the timestamp of event 10/LRMS minus the timestamp of event 8/LRMS), and the makespan $m$, that is the time from submission to completion (evaluated as the timestamp of event 10/LogMonitor minus the timestamp of event 17/UI). The intrinsic execution time might be overestimated if the sites where the job is run accept concurrent execution. The striking fact is the very large number of extremely short jobs. We call Short Deadline Jobs (SDJ) those where t < 10 minutes, and Medium Jobs (MJ) those with t between ten minutes and one hour. SDJ account for more than 90% of the total number of jobs, and consume nearly 20 of the total execution time, in the same range as jobs with $t$ less than one hour (17%). Next, we considering the overhead o =(m-t)/t. As usual, the overhead decreases with execution time, but for SDJ, the overhead is often many orders of magnitude superior to $t$. For MJ, the overhead is of the same order of magnitude as $t$. Thus, the EGEE service for SDJ is seriously insufficient. One could argue that bundling many SDJ into one MJ could lower the overhead. However, interactivity will not be reached, because results will also come in a bundle: for graphical interactivity, the result must obviously be pipelined with visualization; in the test-debug-correct cycle, there might be not very many jobs to run. With respect to grid management, an interactivity situation translates into a QoS requirement: just as video rendering or music playing requires special scheduling on a personal computer, or video streaming requires network differentiated services, servicing SDJ requires a specific grid guarantee, namely a small bound on the makespan, which is usually known as a deadline in the framework of QoS. The overhead has two components: first the queuing time, and second the cost of traversal of the middleware protocol stack. The first issue is related to the grid scheduling policy, while the second is related to grid scheduling implementation. 3. A Scheduling Policy for SDJ Deadline scheduling usually relies on the concept of breaking the allocation of resources into quanta, of time for a processor, or through packet slots for network routing. For job scheduling, the problem is a priori much more difficult, because jobs are not partitionable: except for checkpointable jobs, a job that has started running cannot be suspended and restarted later. Condor has pioneered migration-based environments, which provide such a feature transparently, but deploying constrained suspension in EGEE would be much too invasive, with respect to existing middleware. Thus, SDJ should not be queued at all, which seems to be incompatible with the most basic mechanism of grid scheduling policies. The EGEE scheduling policy is largely decentralized: all queues are located on the sites, and the actual time scheduling is enacted by the local schedulers. Most often, these schedulers do not allow time-sharing (except for monitoring). The key for servicing SDJ is to allow controlled time-sharing, which transparently leverages the kernel multiplexing to jobs, through a combination of processor virtualization and slot permanent reservation. The SDJ scheduling system has two components. - A local component, composed of dedicated single-entry queues and a configuration of the local scheduler. Technical details for can be found at http://egee- na4.ct.infn.it/wiki/index.php/ShortJobs. It ensures the followig properties: the delay incurred by batch jobs is at most doubled; the resource usage is not degraded, eg by idling processors; and finally the policies governing resource sharing (VOs, EGEE and non EGEE users,...) are not impacted. - A global component, composed of job typing and mapping policy at the broker level. While it is easy to ensure that SDJ are directed to resources accepting SDJ, LCG and gLite do not provide the means to prevent non-SDJ jobs from using the SDJ queues, and this requires a minor modification of the broker code. It must be noticed that no explicit user reservation is required: seamless integration also means that explicit advance reservation is no more applicable than it would be for accessing a personal computer or a video-on-demand service. In the most frequent case, SDJ will run with under the best effort Linux scheduling policy (SCHED_OTHER); however, if hard real-time constraints must be met, this scheme is fully compatible with preemption (SCHED_FIFO or SCHED_RR policies). In any case, the limits on resource usage(e.g. as enforced by Maui) implement access control; thus the job might be rejected. The WMS notifies rejection to the application, which could decide on the most adequate reaction, for instance submission as a normal job or switching to local computation. 4. User-level scheduling Recent reports (gLite WMS Test) show impressively low middleware penalty, in the order of a few seconds, which should be available in gLite3.0. It also hints that the broker is not too heavily impacted by many simultaneous access. However, for ultra-small jobs, with execution time of the same order (XXSDJ), even this penalty is too high. Moreover, the notification time remains in the order of minutes. In the gPTM3D project, we have shown that an additional layer of user-level scheduling provides a solution which is fully compatible with EGEE organization of sharing. The scheduling and execution agents are quite different from those in Dirac: they do not constitute a permanent overlay, but are launched just as any LCG/gLite job, namely an SDJ job; moreover, they work in connected mode, more like glogin-based applications. Besides this particular case, an open issue is the internal SDJ scheduling. Consider for instance a portal, where many users ask for a continuous stream of execution of SDJ (whether XXSDJ or regular SDJ). The portal could dynamically launch such scheduling/worker agents and delegate to them the implementation of the so-called (period, slice) model used in soft real-time scheduling.
Speaker: Cecile Germain-Renaud (LRI and LAL)
• 83
Real time computing for financial applications
Speaker: Dr Stefano Cozzini (CNR-INFM Democritos and ICTP)
• 84
Grid-Enabled Remote Instrumentation with Distributed Control and Computation
Speaker: Luke Dickens (Imperial College)
• 85
Efficient job handling in the GRID: short deadline, interactivity, fault tolerance and parallelism
The major GRID infastructures are designed mainly for batch-oriented computing with coarse-grained jobs and relatively high job turnaround time. However many practical applications in natural and physical sciences may be easily parallelized and run as a set of smaller tasks which require little or no synchronization and which may be scheduled in a more efficient way. The Distributed Analysis Environment Framework (DIANE), is a Master-Worker execution skeleton for applications, which complements the GRID middleware stack. Automatic failure recovery and task dispatching policies enable an easy customization of the behaviour of the framework in a dynamic and non-reliable computing environment. We demonstrate the experience of using the framework with several diverse real-life applications, including Monte Carlo Simulation, Physics Data Analysis and Biotechnology. The interfacing of existing sequential applications from the point of view of non-expert user is made easy, also for legacy applications. We analyze the runtime efficiency and load balancing of the parallel tasks in various configurations and diverse computing environments: GRIDs (LCG, Crossgrid), batch farms and dedicated clusters. In practice, the usage of ther Master/Worker layer allows to dramatically reduce the job turnaround time, a scenario suitable for short deadline jobs and interactive data analysis. Finally it is also possible to easily introduce more complex synchronization patterns, beyond trivial parallelism, such as arbitrary dependency graphs (including cycles, in contrast to DAGs) which may be suitable for bio-informatics applications.
Speaker: Mr Jakub MOSCICKI (CERN)
• 4:00 PM
Coffee break
• 86
Grid Computing and Online Games
Speaker: Mr Rafael Garcia Leiva (Adago Ingenieria)
• 87
User Applications of R-GMA
Speaker: Dr Steve Fisher (RAL)
• 5:30 PM
Final discussion on the session topics
• 2d: VO tools - Portals 40-S2-A01

40-S2-A01

CERN

• 2:00 PM
Introduction
• 2:05 PM
VO Support
• 88
Experience Supporting the Integration of LHC Experiments Software Framework with the LCG Middleware
The LHC experiments are currently preparing for data acquisition in 2007 and because of the large amount of required computing and storage resources, they decided to embrace the grid paradigm. The LHC Computing Project (LCG) provides and operates a computing infrastructure suitable for data handling, Monte Carlo production and analysis. While LCG offers a set of high level services, intended to be generic enough to accommodate the needs of different Virtual Organizations, the LHC experiments software framework and applications are very specific and focused on the computing and data models. The LCG Experiment Integration Support team works in close contact with the experiments, the middleware developers and the LCG certification and operations teams to integrate the underlying grid middleware with the experiment specific components. The strategical position between the experiments and the middleware suppliers allows EIS team to play a key role at communications level between the customers and the service providers. This activity is the source of many improvements on the middleware side, especially by channelling the experience and the requirements of the LHC experiments. The scope of the EIS activity encompasses several areas: 1) Understanding of the experiment needs 2) Identify open issues and possible solutions 3) Develop specific interfaces, services and components (when missing in or not yet satisfactory) 4) Provide operational support during Data Challenges, Service Challenges and massive productions. 5) Provide and maintain the user documentation; 6) Provide tutorial for the users community In the last year, the focus has been extended also to non High-Energy Physics communities like Biomed, GEANT4 and UNOSAT. In this work we discuss the EIS experience, describing the issues raising in the organization of the Virtual Organization support and the achievements, together with the lessons learned. This activity will continue in the framework of EGEE II, and we believe could be an example for several users communities on how to optimise their uptake of grid technology in the most efficient way.
Speaker: Dr roberto santinelli (CERN/IT/PSS)
• 89
User and virtual organisation support in EGEE
Speaker: Flavia Donno (CERN)
• 2:45 PM
Discussion
• 3:00 PM
VO Portals
• 90
EnginFrame as FrameWork for Grid Enabled Web Portals on industrial and research contexts.
Speakers: Alberto Falzone (NICE srl), Andrea Rodolico (NICE srl)
• 3:20 PM
Discussion
• 3:30 PM
VO Monitoring
• 91
GridICE monitoring for the EGEE infrastructure
Speaker: Mr Sergio Andreozzi (INFN-CNAF)
• 3:50 PM
Discussion
• 4:00 PM
Coffee break
• 4:30 PM
VO Software Management
• 92
Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal
Speaker: Mr Gergely Sipos (MTA SZTAKI)
• 93
ETICS: eInfrastructure for Testing, Integration and Configuration of Software
• 5:05 PM
Discussion
• 5:15 PM
Other Tools and Infrastructures
• 94
Universal Acessibility to the Grid via Metagrid Infrastructure
Speaker: Dr Soha Maad (Trinity College Dublin)
• 95
Methodology for Virtual Organization Design and Management
Speaker: Mr Lukasz Skital (ACC Cyfronet AGH / University of Science and Technology)
• 96
Discussion
• 6:20 PM
Wrap-up and Conclusions
• Demo and poster session

• Friday, March 3
• User Forum Plenary 3 500/1-001 - Main Auditorium

500/1-001 - Main Auditorium

CERN

400
Show room on map
• 97
Summary of parallel session 2a
Speaker: Harald Kornmayer (Forschungszentrum Karlsruhe)
• 98
Summary of parallel session 2b
Speaker: Johan Montagnat (CNRS)
• 99
Summary of parallel session 2c
Speaker: Cal Loomis (LAL Orsay)
• 10:30 AM
Coffee break
• 100
Summary of parallel session 2d
Speaker: Flavia Donno (CERN)
• 101
EGEE Technical Coordination group
Speaker: Erwin Laure (CERN)
• 102
Long-term grid sustainability
Europe has invested heavily in developing Grid technology and infrastructures during the past years, with some impressive results. The EU EGEE Project (www.eu-egee.org), which provides a coordinating framework for national, regional and thematic Grids, has proved a vital catalyst and incubator for the success of establishing a working, large-scale, multi-science production Grid infrastructure that serves many sciences. As the Virtual Organizations established by scientific communities move from testing their applications on the Grid to routine and daily usage, it becomes increasingly important and necessary to ensure maintainance, reliability and adaptiveness of the Grid infrastructure. This is rather difficult with the usual (short) project funding cycles, which inhibit investment from long-term users and industry. The situation is in some ways analogous to that of scientifc networks, where independent national initiatives led to common standards and ultimately the creation of the DANTE organization. A similar evolution needs to be planned now for Grids, i.e. National Grid Initiatives to guide Grid infrastructure deployment and operation at country-level and a central coordinating body to ensure long-term sustainability and interoperability.
Speaker: Prof. Dieter Kranzlmueller (Linz University and CERN)
• 103
Conference summary
Speaker: Massimo Lamanna (CERN)
• 1:00 PM
Lunch
• EGAAP open session 503/1-001 - Council Chamber

503/1-001 - Council Chamber

CERN

162
Show room on map
• 104
Introduction
• 105
Fusion Status Report
• 106
ARCHEOGRID Status Report
• 107
EUMEDGrid Status Report
• 108
EELA Status Report
• 109
EUchinagrid
• 110
Bioinfogrid
• 111
Discussion on EGAAP future in EGEE-II
• EGAAP open session: EGAAP Closed Session 503/1-001 - Council Chamber

503/1-001 - Council Chamber

CERN

162
Show room on map