Workshop on Campus Grids and Scientific Applications

Europe/Zurich
Aula, University Campus Vienna

Aula, University Campus Vienna

1090 Wien, Spitalgasse 2, Hof 1
Erwin Laure (EGEE/CERN), Wilfried Gansterer (University of Vienna)
Description
The Research Lab Computational Technologies and Applications of the Faculty of Computer Science at the University of Vienna in cooperation with EGEE is organizing a workshop on "Grid Computing and Scientific Applications" in Vienna, Austria on Thursday, January 31 and Friday, February 1, 2008. Please see the workshop homepage for further logistic information.
Participants
  • Thursday 31 January
    • 09:00 17:30
      Day 1 - Thursday
      • 09:00
        Opening 15m
        Speaker: Günter Haring (University of Vienna)
        Slides
      • 09:15
        Democratization of Computing Through Campus Grids 40m
        In recent years, a growing number of universities have deployed advanced distributed computing technologies to build powerful computing facilities on their campus. These Campus Grids offer students and faculty from a broad range of disciplines easy access to local compute and storage resources. The Grid Laboratory of Wisconsin (GLOW) is an example of such a grid. GLOW is a NSF funded, distributed facility at the University of Wisconsin – Madison campus. It is part of the newly formed Center for High Throughput Computing (CHTC) and consists of more than 2000 processing cores and 100 TB of storage located at six different sites. Since its inception in the winter of 04, it has been serving a broad range of disciplines ranging from Biotechnology and Computer Sciences to Medical-Physics and Economics. Each of the GLOW sites is configured as an autonomous locally managed Condor pool that can operate independently when disconnected from the other sites. Under normal conditions, the six pools act like a single Condor system that is coordinated via a highly-available campus-wide matchmaking service. On-campus users interact with GLOW through job-managers located on their desktop computers or community gateways. In the talk, we will present the principals that have been guiding us for more than two decades in developing our distributed computing technologies and high throughput computing software tools. An overview of the GLOW high throughput computing facilities will be provided. Capabilities to “elevate” local GLOW jobs to the national Open Science Grid (OSG) infrastructure will be discussed. These capabilities follow our long standing “bottom-up” approach to the construction and operation of distributed computing infrastructure that maximize reachable capacity while preserving local access, environment and autonomy.
        Speaker: Miron Livny (University of Wisconsin-Madison)
        Slides
      • 09:55
        Linking Chemical Space to Biological Space – Challenges for High-Throughput Computing 40m
        The discovery and development of new, safe drugs is very costly and the rate of failure of drug candidates in late phase clinical studies is high. Establishing new in silico techniques which improve the predictability of bioavailability and safety as early as possible in the drug discovery and development process is thus of major importance to overcome this problem and to improve research and development in Pharmacy. Furthermore, with the increasing knowledge provided by systems biology it became evident that drug efficacy, specificity and safety have to be treated on a systems level rather than on single targets. Cancer, metabolic syndrome and CNS-diseases are just a few examples where the consideration of the interplay of regulatory networks, pharmacological targets, off-pharmacology, and bioavailability is mandatory for successful development of new, safe medicines. Chemogenomics aims at mapping the chemical space onto the biological/pharmacological space in order to identify new lead compounds. Thus, the knowledge on the chemical space covered by each single target is of major importance for selectivity and safety profiling. We used self-organising maps and chemical similarity based approaches to characterise the chemical space related to several drug-transporter and to identify new lead structures via in silico screening of medium sized compound libraries. Applying these concepts to the regulatory network of nuclear receptors, drug transporters and metabolising enzymes and utilising the whole drug like chemical space will require front-end high throughput computing.
        Speaker: Gerhard Ecker (University of Vienna)
        Slides
      • 10:35
        Coffee 25m
      • 11:00
        Algorithmic Challenges in the CPAMMS Project 40m
        The CPAMMS project ("Computing Paradigms and Algorithms for Molecular Modelling and Simulation: Applications in Chemistry, Molecular Biology, and Pharmacy”) is an interdisciplinary research initiative involving participants from Quantum Chemistry, Molecular Biology and Pharmacy which focuses on the development of innovative methods and technologies for selected computational molecular modeling and simulation problems. One of the central components in the project is the development of highly efficient algorithms for better utilizing the potential of distributed computational resources (in particular, parallel/distributed computer systems and computational grids). After an overview of the main research efforts in the CPAMMS project, we concentrate on the part of the project which deals with algorithmic challenges arising in the context of in-silico screening for computational drug design. Of particular importance in this context are methods for selecting optimal descriptor sets and for mapping those back to the original (potentially much larger) set of physicochemically interpretable descriptors. Due to the size of the data sets involved and due to the high computational complexity of the methods involved it is very important to efficiently utilize distributed computational resources. We will summarize state-of-the-art descriptor selection methods and discuss their respective potential and challenges for execution in distributed environments such as campus grids.
        Speaker: Wilfried Gansterer (University of Vienna)
        Slides
      • 11:40
        Use of Scientific Software in Distributed Environments - A Practical Experience with VGE and EGEE 40m
        The ab-initio program-suite COLUMBUS was installed within the COMPCHEM-VO in the EGEE Grid and deployed as a VGE (Vienna Grid Environment) web-service at local sites within the University of Vienna. Additionally the Molecular Dynamics package NEWTON-X was deployed as VGE web-service. These two systems, the VGE and the EGEE, persue different philosophies, representic a bottom-up vs. a top-down approach to the problem of access to computational resources in heterogenous environments. From the end-user's point of view both systems have their advantages and downsides, but are promising prototypes for future development.
        Speaker: Matthias Ruckenbauer (University of Vienna)
        Slides
      • 12:20
        Lunch 1h 25m
      • 13:45
        UNICORE – A European Grid Technology 40m
        The development of UNICORE started back in 1997 with two projects funded by the German ministry of education and research (BMBF). UNICORE is a vertically integrated Grid middleware, which provides a seamless, secure, and intuitive access to distributed resources and data and provides components on all levels of a Grid architecture from an easy-to-use graphical client down to the interfaces to the Grid resources. Furthermore, UNICORE has a strong support for workflows while security is established through X.509 certificates. Since 2002 UNICORE is continuously improved to mature production ready quality and enhanced with more functionalities and standards in several European projects. Today UNICORE is used in several national and international Grid infrastructures like D-Grid and DEISA and is also providing access to the national Supercomputer of the NIC in Germany. The presentation includes a detailed introduction to the current version 6, which has a Web Services (WS-RF 1.2, SOAP, WS-I) stack and implements several open standards like XACML, SAML, OGSA-RUS, OGSA-BES, JSDL, DRMAA, OGSA-ByteIO and others. The European UNICORE Grid Technology is available as Open Source from http://www.unicore.eu
        Speaker: Achim Streit (FZJ)
        Slides
      • 14:25
        Integrating Authentication and Authorization Infrastructures with Grids 40m
        The visions of grid computing as well as the vision of national authentication and authorization infrastructures (AAI) have both become reality in the past few years. Whereas Grid security is based on X.509 certificates, AAIs are typically based on federating campus identity management systems. We review the current state of interoperability between Grids and AAI with a focus on Shibboleth, the most widely used AAI today.
        Speaker: Chad La Joie (Switch)
        Slides
      • 15:05
        Coffee 25m
      • 15:30
        EGEE: Providing a Production Grid Infrastructure for Collaborative Science 40m
        EGEE (Enabling Grids for eScience) operates a large scale production Grid infrastructure federating over 250 sites from 45 countries world-wide providing over 45000 CPUs and about 15 PB of disk storage to a wide variety of scientific applications. In this talk we review the challenges and successes of EGEE in building, operating, and evolving the Grid infrastructure and highlight a few example applications. In the second half of this talk we will discuss future directons of Grids in Europe, in particular how National and Regional Grid Infrastructures will pave the way to the sustainable provision of production Grids.
        Speaker: Erwin Laure (EGEE/CERN)
        Slides
      • 16:10
        Life Sciences Applications on a Computing Grid Infrastructure 40m
        Grids open exciting perspectives to address many of the challenges faced by biomedical research in terms of large scale computing and data management. The e-science group at LPC Clermont-Ferrand (http://clrwww.in2p3.fr/PCSV) focuses on deploying applications and developing services relevant to life sciences and healthcare in grid environments. The group is actively involved in the deployment of large scale distributed applications on the EGEE II grid infrastructure. This European project aims at federating distributed resources sharing data and providing a continuity of service using grid technology. As well, the group studies the development of bioinformatics services for complex data management and service discovery, and an efficient telemedicine application to share medical data in a secure way by using grid potentialities. Those applications use web services within the framework of EGEE, EUMed Grid, EELA, EUCHINA Grid and the EMBRACE European network of excellence. Within the framework of the EGEE II project, the team studies the impact of grids for reducing computing time on the GATE Monte-Carlo simulation platform for radiotherapy and brachytherapy applications and on a wide in silico docking application on malaria (WISDOM project). Both applications are distributed in several small jobs which are then deployed on the computing grid. Jobs are submitted by using convivial and secure web portal interfaces. On both applications, the EGEE infrastructure allows to reach very significant acceleration in computing time, up to a factor 300 for Monte Carlo simulations and up to a factor of 2000 for the WISDOM application by reducing the calculations to 90 days instead of the 412 years by using one CPU. In a near future, a web portal will make it possible for a physician or a bioinformatician to reach the grid resources using an internet connection. .
        Speaker: Lydia Maigne (Unknown)
        Slides
      • 16:50
        Integrated Biomedical Grid Infrastructure 40m
        The European @neurIST project aims to create an integrated biomedical Grid infrastructure for the management of all processes linked to research, diagnosis and treatment development for complex, multi-factorial diseases. The project bases its developments on a service-oriented Grid architecture encompassing data repositories, information systems, simulation, modeling and computational analysis services handling multi-scale, multi-modal information at distributed sites. In this talk we will provide an overview of the VGE Grid infrastructure and its utilization within the @neurIST project. VGE relies on standard Web services technologies for virtualizing compute-intensive applications and distributed data sources as Grid services that can be securely accessed on demand over the Internet. VGE compute services enable transparent access to high-end computing facilities, supporting dynamic SLA negotiation and QoS guarantees for time-critical usage scenarios. VGE data services facilitate the integration of heterogeneous data sources based on data mediation mechanisms realized on top of OGSA-DAI and OGSA-DQP. We will also outline the client-side environment offered by VGE for the construction of advanced Grid applications from data and compute services.
        Speaker: Siegfried Benkner (University of Vienna)
        Slides
  • Friday 1 February
    • 09:00 13:00
      Day 2 - Friday
      • 09:00
        Building Virtual Organizations Around Supercomputing Grids and Clouds. 40m
        A virtual organization is a community of people that come together to solve a problem, share knowledge or a common interest. Grid infrastructures can support this concept. In the TeraGrid (the U.S. National Science Foundation supercomputer grid) this is done through the Science Gateway program. A different model is based on social networks, which allow people to easily form common interest groups. “Cloud Computing” is based on the use of large, commercial data centers that allow applications to be uploaded and shared. In this talk we examine the architecture of Science Gateways for TeraGrid and explore ways that clouds can provide a more “frictionless” approach to solving the problem of both data and application sharing. We will also discuss this in the context of campus Grids and how support for VOs can be provided within a university.
        Speaker: Dennis Gannon (University of Indiana)
        Slides
      • 09:40
        PS3GRID.NET: Building a distributed supercomputer using the PlayStation 3 40m
        The new Cell processor of the PlayStation3 is capable of improving Performances of computing intensive applications by more than one order of magnitude, as recently demonstrated by the full-atom CellMD program for molecular dynamics (MD) applications to bio-molecules. We have used CellMD to exploit such computational power using distributed computing over the Internet provided by the computational infrastructure behind SETI@home, the Berkeley Open Infrastructure for Network Computing (BOINC). In PS3GRID, we target explicitly the Cell processors available in Sony Playstation3 game consoles to compute ion permeability for the trans-membrane Gramicidin A by distributed steered molecular dynamics simulations. The availability of PlayStation3 game Consoles and the steered molecular dynamics protocol allow us to perform novel computational molecular experiments on biomolecules by harnessing enourmous computational resources.
        Speaker: Gianni De Fabritiis (Pompeu Fabra University)
        Slides
      • 10:20
        Structuring and Sharing Medical Images on the Grid 40m
        Sharing and organising medical imaging knowledge is a key issue in medical research and training. Evidence-based medicine is also demanding high-quality well-organised knowledge bases to check for second opinion and drive diagnosis. However, sharing and organising medical imaging data is not straightforward. Technological and legal problems on exchanging data make it difficult or even impossible with the current infrastructures. On the other side, the index criteria used in clinical practice are inefficient when searching for knowledge. TRENCADIS (Towards a Grid Environment to Share and Process DICOM Objects) is a middleware developed at the Grid and High Performance Computing Research Group (GRyCAP, www.grycap.upv.es) of the Universidad Politécnica de Valencia (UPV, www.upv.es). The first advantage lies on the organisation of data. The availability of a Grid platform to securely share cases will enable increasing the significance of the studies, through the enlargement of the study sample, and the support to learning through representative cases. Currently, data is organised by administrative and demographic keys, which prevents from searching for specific diagnosis. In order to implement this functionality, an authorisation architecture has been implemented to define workgroups, case access permission and relations. Encryption and key share management is necessary to prevent from unauthorised access from users with administrative privileges. The relevant studies should be explicitly selected and carefully documented through the structured report. In order to ease this process, advanced post-processing tools are provided in the form of WSRF services. This Grid services provide the users with the access to advanced algorithms and computer resources. CVIMO (Valencian Cyber-infrastructure for Medical Imaging in Oncology) is a deployment of TRENCADIS to share and organise medical studies and reports based on ontologies constructed upon the fields of structured reports. This platform enables the users to submit new cases which are automatically organised according to the semantic criteria defined through the Virtual Organisations and groups. The platform provides a virtual data catalogue based on the metadata coming from the evaluation report of the radiologists.
        Speaker: Dr Ignacio Blanquer (UPV)
        Slides
      • 11:00
        Coffee 20m
      • 11:20
        The European Grid Initiative - Integrating Grids in Europe 40m
        The European Grid Initiative (EGI) represents an effort to establish a sustainable grid infrastructure. Driven by the needs and requirements of the research community, it is expected to enable the next leap in research infrastructures, thereby supporting collaborative scientific discoveries. The main foundation of EGI are the National Grid Initiatives (NGIs), which operate the grid infrastructures in each country. EGI will link existing NGIs and actively supports the setup and initiation of new NGIs. This presentation covers the current ideas of EGI and the planning for the next steps in the setup process. It should be the basis for discussing the collaboration with NGIs other grids in order to develop the global sustainable grid instrastructure.
        Speaker: Dieter Kranzlmueller (University of Linz)
        Slides
      • 12:00
        Problems from Evolutionary Bioinformatics with Potential for Grid Applications 40m
        The understanding of the evolutionary history of genes, proteins, genomes, and whole organisms is an important prerequisite in medical, biological, and bioinformatics research. The essential tool to gain insights into the history is the reconstruction of evolutionary trees, so-called phylogenies, and has become a common task in biological and bioinformatics research. Biological sequence data such as DNA or protein sequences usually serve as input for phylogenetic analysis since they preserve traces of the process of mutation and selection during their development. Many problems of phylogenetic analysis are known to be NP-complete or even NP-hard. Thus, all methods used nowadays for more than 10 sequences are in fact heuristics. However, these heuristics still suffer from the ever growing amounts of data in the public databases, which end in a humongous amount of phylogenies increasing exponentially with the number of sequences or species in the analysis. Hence, since the middle of the 90ies, parallel and distributed computing entered the field of phylogenetic analysis as a tool to reduce the running time of the analysis. Among the most reliable methods currently in use are those based on statistical principles such as maximum likelihood or Bayesian statistics. However, these methods also belong to the computationally most demanding. Here we address a number of typical problems from the field of evolutionary bioinformatics and phylogenetics and will exemplify achievements parallelizing maximum likelihood methods. In this context we will highlight the impact of different granularity. Furthermore, we will present potential applications and current achievements using grid technologies to further improve the performance.
        Speaker: Heiko Schmidt (CIBIV)
        Slides
      • 12:40
        Workshop Closure 10m
        Speakers: Erwin Laure (EGEE/CERN), Wilfried Gansterer (University of Vienna)