Grid Computing: a new tool for Science and Innovation

Europe/Zurich
Kulturni Dom of Veli Loลกinj

Kulturni Dom of Veli Loลกinj

Veli Loลกinj, Croatia, August 25-29, 2009
Massimo Lamanna (CERN), Mirco Mazzucato (INFN)
Description
The access to massive computing resources and to large distributed data sources is becoming available to many scientific and technological activities. With the uptake of Grid technology and the progressive deployment of large infrastructure project like EGEE (Enabling Grid for E-SciencE) in Europe, NorduGrid in the Nordic countries and OSG in US) many more scientific domains and technology-related activities have access to sizeable computing resources. The Grid offers the environment where diverse scientific and technological activities can exchange ideas and collaborate.
Paper
Support
    • Opening and lectures (Chair: F. Bradamante)
      • 1
        Welcome addresses
        - Welcome from Losinj (R. Zugac City of Losinj) - The ECSAC initiative (G.C. Ghirardi University of Trieste) - Paolo Budinich and the Trieste System (F. Bradamante Univ and INFN Trieste)
      • 2
        Conference Objectives
        Speaker: Massimo Lamanna (CERN)
        Slides
      • 3
        Enabling applications on the Grid
        Grids have grown and consolidated over last years to include over 90K CPU cores in 250+ computing centers, several PB of storage and 10K users in 150+ active VOs in the EGEE Grid alone. High-level tools above generic Grid middleware are essential for efficient enabling of applications and easy access for end-users to large computing capacity. Many tools have been created at CERN to support High-Energy Physics (HEP) and some of them have been reused in other areas of science and technology. This includes tools such as Ganga, DIANE, AMGA and Dashboards. We show several examples of complex systems based on Ganga and DIANE frameworks which are in use in HEP, and how they have been successfully applied in other contexts such as scientific applications in Theoretical Physics and Biomedical applications, to mention a few. The same tool is used to provide on-demand capacity for large engineering calculations (e.g. International Telecommunication Union planning process) and for demanding for software engineering (e.g. Geant4 regression testing). The application porting strategy presented here may be flexibly customized to fit the needs of particular user community and software components chosen a la carte without complex deployment and administrative overhead. We discuss the conditions to be met by applications for this strategy to be most efficient.
        Speaker: Jakub Moscicki (CERN)
        Slides
      • 4
        Grid initiatives and activities in Croatia
        In this presentation an overview of the current grid activities in Croatia gathered around Croatian National Grid Infrastructure (CRO NGI), and history of grid activities in Croatia will be given. CRO NGI (Croatian National Grid Infrastructure) is an integrated allocated computer environment, consisting primarily of computer (processing) and data (disc and tape) resources, which are located in geographically allocated sites within the Republic of Croatia. CRO NGI is a common resource of the scientific and academic community and represents the fundamental infrastructure for the scientific research, the application of new technologies and the integration of Croatia and Croatian scientists into the European Research (ERA) and European Higher Education (EHEA) Area.ย ย  The Coordinator of CRO NGI is the University Computing Centre University of Zagreb - SRCE. The CRO NGI Board, appointed by the minister responsible for science, the Council of Partners and the Council of Users take part in the management.ย  CRO NGI is financed as a separate unit in the State Budget of the Republic of Croatia. The first CRO NGI Partner Contracts were signed on November 5, 2007 by four institutions, plus two Partners by default - the Ministry of Science and Education and CARNet, the Croatian NREN. The Croatian National Grid Initiative (CRO-GRID) is a voluntary-based association of academic and research institutions, government institutions and commercial companies gathered around CRO NGI. Founders and members of CRO-GRID are the institutions which took part in the poly-project CRO-GRID from 2004 to 2007
        Speaker: Ivan Maric (SRCE, University computing centre, University of Zagreb)
        Slides
    • Lectures (Tuesday afternoon)
      • 5
        WISDOM: In-silico docking against neglected and emerging diseases
        Grid technology opens new perspectives for data analysis in life sciences. For several years now, e-infrastructures such as EGEE allow deploying embarrassingly parallel computations at a very large scale, opening new avenues for very CPU demanding analyses such as high throughput virtual screening. Recently, technology has significantly made progress for distributed data management towards secured sharing of data bases at regional, national and international levels of particular interest for medicine and healthcare. We will illustrate the current and future impact of grid e-infrastructures through a short presentation of three projects currently explored at LPC Clermont-Ferrand: WISDOM collaboration on in silico drug discovery, a cancer surveillance network in Auvergne and an international Influenza A surveillance network.
        Speaker: Vincent Breton (IN2P3 Clermont-Ferrand)
        Slides
      • 6
        Protein folding: BEM algorithm on the Grid
        Speaker: Alessandro Laio (SISSA)
      • 7
        LIBI: experience with Bioinformatics Applications on EGEE
        The LIBI project (International Laboratory of Bioinformatics), a project funded by the Italian Ministry for Education, University and Research (MIUR) has chosen to use the EGEE Grid infrastructure for the execution of its High Throughput applications, i.e. applications that can be decomposed into many smaller independent elementary tasks. Examples of bioinformatics use cases requiring the submissions to the grid of a large number of jobs will be given. The procedure used for porting bioinformatics applications to the grid will also be described. In particular it will be presented the Job Submission Tool (JST), a tool which was extensively used inside the LIBI project for the management, bookkeeping and monitoring of the large scale submissions to the grid, including its recent developments focused in providing a simple web based interface for the tool. Examples of the specific procedures adopted for the management of the large number of input and output files used and/or produced during the job execution will be described in detail. The talk will finally cover some open issues on which the work is going on in order to better fulfill the requirement coming from the user community.
        Speaker: Giacinto Donvito (INFN Bari)
        Slides
    • Lectures (Wednesday morning)
      • 8
        Grid Computing for Hadron Therapy Studies
        Irradiation with photons and electrons is a commonly used technique for curative or palliative treatment of malignant tumours for since many years. Clinical treatment with proton and ion beams as a promising radiotherapeutic modality has been explored during the last 60 years: protons were first clinically used in humans in the 1950s, therapies using light ions have their origin in 1975. In Europe several new facilities for hadron therapy have been constructed or are foreseen to become operational in the next years. Examples of such facilities are CNAO in Italy, where the first patient should be treated in 2010, and MedAustron in Austria, which plans to start operation in 2012. Monte Carlo simulations are a useful tool to study source and target (phantom) configurations. These techniques complement treatment planning system based on parametrization and allow to perform precise scientific studies. They can also play a special role during commissioning of a facility, providing input for the configuration and operation even before the first beams are available. Detailed simulations require significant computing resources. As the simulations can be performed in long running jobs, which are independent, a grid infrastructure is well suited for these task. In Austria such simulations have been performed using the Ganga toolkit, which provides the researcher with a easy to use interface. Also the DIANE Distributed Analysis Environment has been used to improve the resource management for computing intensive tasks. Other grid based activities in the European context will be also presented.
        Speaker: Dietrich Liko (Austrian Academy of Science, Vienna)
        Slides
      • 9
        Tracking the Genetic Legacy of Past Human Populations through the Grid
        Knowing the past demography of human populations over the last 100'000 years is a fascinating, but difficult endeavor. Inferring this past demography has been classically approached through data from the archaeological record, but more recently by the use of genetic data from contemporary samples. Building realistic demographic models at the continental scale is also a necessary step toward the improvement of current genomic methods aiming at finding genes under selection, which may be linked to genetic adaptations or disorders. In light of recent advances in Bayesian statistical inference, we discuss here the importance of considering spatially-explicit approaches for modeling population expansion and dispersal. Due to the large parameter space to explore and the computationally intensive spatial simulations, grid computing is an important tool to be able to compare several realistic scenarios for human evolution. Our main simulation tool, SPLATCHE (SPatiaL And Temporal Coalescences in Heterogeneous Environment), will be presented. SPLATCHE has been ported to the EGEE infrastructure. We will discuss the porting process and give several examples of how the tool has been used to shed light on important demographic and genetic processes that have occur during the evolution of our species.
        Speaker: Nicolas Ray (UNEP and University of Geneva)
        Slides
      • 10
        Grid data repositories with AMGA and gLibrary
        Often the "grid" term brings to people's mind the concept of powerful and distributed computing resources to run long-lasting simulations or storm of tasks. But the existence of "data" grids are not of second importance for real and virtual communities coming from the scientific and industry worlds. Those communities, indeed, could have a lot of data to be shared in a secure way and have the need to preserve and access those data easily and from anywhere. Because of the big amount of storage space available in distributed data grids, a mechanism to federate and describe those data repository is needed. One approach to cope with this problem is to make use of metadata catalogues. In the presentation, we are going to present the AMGA Metadata Service, developed in the context of the EGEE project. This service allows to add semantic "description" (metadata) to data saved on data grids and to answer user's and application's queries against those metadata to easily and quickly retrieve the desired files. We will illustrate then some use cases and real applications from the communities that make use of this service to create their repositories. In particular, a system to create and manage digital libraries on grid, named gLibrary and based on AMGA will be showed, demonstrating how a real repository of ancient manuscripts has been implemented on a data grid.
        Speaker: Antonio Calanducci (INFN Catania)
        Slides
    • Lectures (Wednesday afternoon)
      • 11
        The NorduGrid project and the ARC middleware
        ARC is one of the top GRID middlewares used nowadays among several organizations, most notably CERN and LHC experiments. Very clear architecture, robust design, non-intrusive installation and portability make it popular among several clusters in many countries. An overview of ARC, Nordugrid and associated projects will be presented with the current status of operation, specific features and unique solutions. The development of the new major ARC release 1.0 will provide much higher extensibility and easier interoperation with other GRID solutions.
        Speaker: Andrej Filipcic (Jozef Stefan Institute, Ljubljana)
        Slides
      • 12
        Monitoring, Control and Optimization in large scale distributed systems
        An important part of managing large-scale, distributed data-processing facilities is a monitoring system for computing facilities, storage, networks, and the very large number of applications running on these systems in near realtime. The monitoring information gathered for all the subsystems is essential for developing the required higher-level servicesโ€”the components that provide decision support and some degree of automated decisionsโ€”and for maintaining and optimizing workflow in large-scale distributed systems. These management and global optimization functions are performed by higher-level agent-based services. To satisfy the demands of data-intensive applications the high level services we are developing provide synergetic relationships between the applications, computing, and storage facilities and the network infrastructure. Current applications higher-level services include optimized dynamic routing, control, and optimization for large-scale data transfers on dedicated circuits, data-transfer scheduling, distributed job scheduling, and automated management of remote services among a large set of grid facilities.
        Speaker: Iosif Legrand (Caltech)
        Slides
      • 13
        Powering the Grid: the EGEE project and the gLite Middleware Stack
        Enabling Grids for E-sciencE (EGEE) is Europe's leading Grid computing project, providing a computing infrastructure covering about 300 sites for over 10,000 researchers world-wide, from fields as diverse as high- energy physics, earth and life sciences. The EGEE infrastructure is based on a Grid Middleware stack called gLite, which is integrated, certified and distributed by the project itself. gLite provides basic services covering security, information system, accounting, computing and storage. In addition, gLite includes also higher-level services for job management, data catalogs and data replication, providing applications with complete end-to-end solutions. An overview of the EGEE project and of the gLite middleware will be presented, with their current status and an outlook about the expected evolution in the EGI era.
        Speaker: Francesco Giacomini (INFN Bologna and CERN)
        Slides
    • Lectures (Thursday morning)
      • 14
        Climate change computational challenges and the role of grid infrastructures
        One of the main challenges within the climate change debate is to provide accurate climate change information at the regional to local scales, so that policy decisions on adaptation and mitigation options can be taken at the national or sub-national level. This requires the use of high resolution climate models as well as the completion of large ensembles of simulations to characterize the uncertainties in regional climate change projections. In addition, regional feedbacks due for example to landuse change and atmospheric aerosols can substantially modify the regional climate change signal, which implies the use of more comprehensive climate models than in the past. High spatial resolution, increased model complexity and need for large ensembles of simulations require a massive increase in the computational resources needed to improve the usefulness and reliability of climate change projections. Grid infrastructures, with the large amount of computational resources available both in term of computing power and storage can provide optimal computational platforms to address this challenge. Some attempts in this direction was have been done within EU-IndiaGrid project and will be further improved in the second phase of the project where we plan to use jontly EGEE/gLite and Garuda grid infrastructure.
        Speaker: Filippo Giorgi (ICTP Physics of Weather and Climate Section)
        Slides
      • 15
        EnviroGrids: Sustainable Development of the Black Sea Catchment
        The Black Sea Catchment is recognized for its ecologically unsustainable development and inadequate resource management. The 4-year FP7-funded EnviroGRIDS project (start: April 2009, 27 partners) will address these issues by developing a Spatial Data Infrastructure (SDI) targeting this region and linked to the EGEE infrastructure. A large catalogue of environmental data sets (e.g. landuse, hydrology, climate) will be gathered and used to perform distributed spatially-explicit simulations to build scenarios of key environmental changes. A high resolution (sub-catchment spatial and daily temporal resolution) water balance model will be applied to the entire Black Sea catchment (2 mio. km2) using a gridified version of the Soil Water Assessment Tool (SWAT). SWAT modules for uncertainty and sensitivity analysis on SWAT will also be gridified using GANGA for front-end job management. We will explain why the grid plays a key role in this project, and what are the planned steps to link a large SDI to the grid. Foreseen challenges will be discussed, along with what positive role the grid can play in relation to environmental data standardization and dissemination.
        Speaker: Nicolas Ray (UNEP and University of Geneva)
        Slides
      • 16
        Distributed Data Handling Infrastructures in Climatology and the Grid
        Modern coupled climate models are mostly running on dedicated, tightly coupled HPC computers. They produce exponentially growing amounts of data. This model data is becoming important for a large, diverse group of people. A powerful data handling and processing infrastructure for climate scientists is needed to support them in finding, accessing, comparing data as well as generating new derived data products e.g. for climate impact studies. Also policy makers as well as the private sector have an increasing demand on infrastructural facilities to make model data products easily accessible. Grid technology is one key component to build up such an infrastructure. In this talk we present developments done in several national and international projects towards a distributed data handling and processing infrastructure. Experiencens from the German C3Grid project and the prototype C3Grid/EGEE integration are summarized. Additionally recent developments towards a world wide climate moded data infrasctructure in the context of the international climate model intercomparison project (CMIP5) and the data handling effort for the next intergovernmental panel of climate change (IPCC) assessment report are presented.
        Speaker: Stephen Kindermann (DKRZ Hamburg)
        Slides
    • Hands-on session
      • 17
        Hands-on session
    • Lectures (Thursday afternoon)
      • 18
        Computational challenges for the Planck mission
        Planck is an ESA mission, launched on 14 May 2009 with a payload of two instruments, and is dedicated to the mapping of the microwave sky in 9 frequency bands in the 30-850 GHz range. After an introduction describing the basics of the Planck mission and the technical characteristics of the instruments, the continuing activities needed to perform end-to-end simulations of the full mission are discussed, with special emphasis on the computational challenges and on the porting of the simulation code to the Grid and HPC systems, together with a discussion of the solutions found.
        Speaker: Fabio Pasian (INAF OATrieste)
        Slides
      • 19
        Grid and Astrophysical Research: current activities and future perspectives
        The interest of the astronomical community for the Grid technology dates back early 2000s when various initiatives were undertaken in several countries to set up national Grid infrastructures. In the meanwhile some European projects were funded by the European Commission aimed at creating and maintaining a stable European Grid Infrastructure for the benefit of several astronomical communities and in this process the HEP community at CERN played a leading role. The most important of these projects are certainly the various editions of the EGEE projects where the astronomical community has a presence since 2004 when the first two astronomical applications were approved: the Major Atmospheric Gamma-ray Imaging Cherenkov Telescope and Planck. Ever since the astronomical community is continuously growing in terms of applications, tools and services aimed at making smoother the gridification of applications and make the Grid more appealing for users. The amount of shared resources within the Virtual Organizations increased as well. The presentation aims at providing a complete overview of the most relevant activity carried out by the astronomical community in EGEE, especially during the third phase of the project when the astronomical cluster was created as an aggregating body for all interested astronomical groups. This activity will highlight that often the usage of the Grid by astronomers is not traditional and requires non-conventional resources and services to be integrated in Grid infrastructures. Finally the presentation will try to show possible future developments and perspectives with the Grid for the astronomical community now that the EGI era is approaching.
        Speaker: Claudio Vuerli (INAF OATrieste)
        Slides
      • 20
        Simulations at the nanoscale on the Grid using Quantum ESPRESSO
        First-principle simulations based on density-functional theory have become quite common in the study of the matter at the nanoscale. Simulations for quite complex and realistic models are now within our reach, but such simulations are still considered HPC applications, requiring large vector or parallel computers with specialized hardware. In this talk I report on our experience on Grid utilization for realistic computations at the nanoscale, using the open-source Quantum ESPRESSO distribution of software. The chosen application: calculation of phonon dispersions for a relatively complex crystal structure, is a prototype for simulations that are suitable for Grid computing, since it has moderate RAM requirements, long execution times, and it can be split into many semi-independent tasks. The Quantum ESPRESSO software, designed for execution on all kinds of machines from single PCs to massively parallel machines, was subject to minor modification to simplify the automatic splitting into many subtasks. A python interface takes care of scheduling tasks to the GRID, collecting results and re-scheduling failed tasks. Our experience shows that in spite of the high failure rate, execution on the Grid compares favourably with MPI parallelization on conventional HPC hardware.
        Speaker: Paolo Giannozzi (University of Udine and CNR/INFM Democritos)
        Slides
    • Lectures (Friday morning)
      • 21
        Plasma Physics: Scientific and Computational Challenges
        In preparing for ITER, a number of computational challenges need to be overcome: individual parts of the problem are "grand challenge" problems in their own right, but they also need to be combined to prepare for simulations that encompass all the relevant space and time scales. The talk will describe the work being done by EUFORIA and the ITM to use a scientific workflow engine to launch parts of the work on local resources, on the Grid and to High-Performance Computers.
        Speaker: David Coster (MPI for Plasma Physics, Munich)
        slides
      • 22
        Lattice QCD on the Grid
        Quantum Chromodynamics, the theory describing the interactions of quarks and gluons, is not amenable to an analytic solution but can be studied numerically via large-scale Monte Carlo simulations. We show why and how the particular question we posed, regarding the existence of a QCD chiral critical point, can be studied efficiently on the Grid.
        Speaker: Philippe de Forcrand (ETH and CERN)
        Slides
      • 23
        LHC and Grid Computing
        The Large Hadron Collider accelerator at CERN will produce proton-proton collisions at unprecedented energy and frequency, to be exploited by large experiments to study the intimate structure of matter and energy. Everything about LHC is large from physical dimensions to number of scientist to amount of data to be processed and of computing sites involved. This presentation will explain how grid technology is enabling LHC experiments to provide a uniform working environment to their distributed communities and to achieve the needed processing throughput. We will also look at how the grid is acting as a powerful drive for communication and community building.
        Speaker: Stefano Belforte (INFN Trieste)
        Slides
    • Hands-on session
      • 24
        Hands-on session (4)
    • Lectures (Friday afternoon)
      • 25
        Collaboration with real applications: extending Grid technology
        Over the past years, European Union has invested heavily in development of an e-Infrastructure based upon Grid technologies and keeps investing to assure its maintenance. However, the adoption of the Grid technologies by various scientific communities is still farily slow due to their complexity. The DORII project aims to deploy e-Infrastructure for scientific communities with a need to access expensive instrumentation. Sharing these devices over dedicated networks and using them remotely has not only considerable advantages for users of the equipment but also for the instrument owners. Currently, DORII provides support to the earthquake science community (Sensor networks), the environmental science community (Coastal monitoring and oceanography) and the experimental science community (Synchrotron beam-lines). Working closely with end-users, solutions are put in place that build upon the success of past and ongoing EC projects in areas such as remote instrumentation, interactivity, software frameworks for application developers and advanced networking technologies with EGEE-based middleware.
        Speaker: Milan Prica (Sincrotrone Trieste SCpA (Elettra))
        Slides
      • 26
        Fusion applications on the EGEE grid
        This work deals with the problems solved in fusion research by means of grid computing. The computing necessities for fusion are discussed and the applications that have been ported to grid as well as their main physical results are described. The range of plasma physics research covered by this set of tools is analysed and the future of grid computing for fusion research is discussed. The possibility of establishing complex workflows between grid and high performance computers (HPC) applications has been also explored.
        Speaker: Francisco Castejon (CIEMAT)
        Slides
    • Lectures and Conference Wrap-up
      • 27
        Extensions of the European Grid Infrastructure
        In the past years several so-called e-Infrastructures have been supported by the European Commission not only limited to Europe but also trying to export/extend the model to other regions of the world. The presentation will present first the state of art of those e-Infrastructures showing their differences and commonalities and then address the future challenges and the possible steps forward.
        Speaker: Federico Ruggieri (INFN Roma Tre)
        Slides
      • 28
        The SEE-GRID-SCI project
        An overview of Grid activities in the South Eastern Europe will be given, covering the objectives, progress and main achievements of SEE-GRID series of projects, which is currently in its third two-year phase under the name SEE-GRID-SCI. Evolution of the regional SEE Grid infrastructure will be presented, as well as various applications developed and user communities supported by the project. Special emphasis will be given to the establishment and development of National Grid Initiatives in the region and plans towards full integration into the EGI era.
        Speaker: Aleksander Belic (Institute of Physics Belgrade)
        Slides
      • 29
        E-infrastructures for Scientific Collaboration in the Trieste
        • a) The Trieste network and the Trieste GRID
          Speaker: Stefano Cozzini (SISSA and CNR/INFM Demokritos)
          Slides
        • b) The INFN-Trieste Computing Farm
          Speaker: Venicio Duic (Istituto Nazionale di Fisica Nucleare (INFN))
          Slides
      • 30
        Towards a Sustainable Grid Infrastructure for Europe
        The European Grid Initiative is currently in a transition from the design into its implementation phase. The vision of a shared pan-European grid infrastructure for scientific collaboration, that explicitly materialized almost three years ago has been transformed into a description int the EGI Blueprint and accompanying deliverables of the EGI Design Study project. The EGI Blueprint identifies the major actors of the EGI infrastructure -- the National Grid Initiatives, the middleware developer, users and also funding sources -- and provides a description of their main roles. Apart of giving the general overview, the lecture will discus the tasks these actors are expected to perform within the EGI. The financial framework and the associated business model will be presented, demonstrating the sustainability aspect of the EGI. The current developments, stemming from the preparation of major EGI-related projects that are under preparation for the actual EC project call will be used to show how the transition from current infrastructures to the EGI is being planned to happen.
        Speaker: Ludek Matyska (Supercomputer Centre Brno)
      • 31
        Conference Wrap-Up
        Speaker: Massimo Lamanna (CERN)
        Slides