e-Science and "The Grid" for Bio/Health Informaticians

IT 407 (University of Manchester)

IT 407

University of Manchester

Manchester Computing, Kilburn Building, Oxford Road


The Northwest Institute for Bio-Health Informatics and the training teams from the UK's National Grid Service and the EU-funded OMII-Europe project invite you to participate in their introductory course about e-Science technologies for bio and health informaticians in the UK. This ‘hands-on’ course is aimed at those who are interested in exploring the potential of grid computing to enhance their life sciences or health research.

With an emphasis on application to the Bio/Health domain, this course has been designed to:

  • Reveal and explain the architecture of Grids. Grids provide infrastructure for collaboration, by permitting resources such as computers, data, software to be orchestrated across and between research communities that span different organisations.
  • Explore the potential for established grids and e-science technologies to enhance research.
  • Communicate an understanding of how to exploit Grids

The course focuses on technologies and infrastructures that are in production use at different scales (project, national, international), and then explores progress towards what has sometimes been called “The Grid.”

Audience and pre-requisites The course is primarily intended for informaticians, members of research projects who are responsible for creating and deploying software applications for the bio/health domains. Although prior experience of command-line or file editing using Linux will be useful in a minority of the practicals, this is not an essential pre-requisite.

Teaching outcomes In this course attendees will:
  • Learn and understand the concepts and architecture of Grids including the UK’s National Grid Service (NGS) and the EU’s EGEE infrastructures.
  • Learn how e-Science technologies have been applied in the Bio/Health domain
  • Gain hands on experience using a variety of software, including for:
    • Workflow (with Taverna and also P-GRADE)
    • Computation on Grids
    • Data management on Grids
  • Gain insight into the challenges of using the technologies
By the end of the course attendees should be able to communicate effectively about e-Science technologies, and also have gained sufficient knowledge and hands-on experience to begin working with them

Location: Room IT407. School of Computer Science, Kilburn Building, Oxford Road

A campus map can be found here. Kilburn is building number 39, in the centre of the map. For more general information see here.

(This building can be seen from the main road - "Kilburn Building")

Costs For participants from a UK University or related institution the course is free.

Registration Please register here.

The following agenda is subject to minor amendment.

A booklet will be provided on the day, but material that has changed since printing or is not in the booklet will be available from the NIBHI website.

    • 10:30 11:00
      Welcome and introduction 30m

      This talk gives an overview of the week's aims, providing a roadmap of the event. This is as follows:

      e-science is concerned with how resources such as compute power and data can be orchestrated by a researcher, or their collaboration, to achieve research goals more easily, more quickly and more powerfully. A rich ecosystem of technologies exists to support e-science. The week intrdouces these from the viewpoint fo the researcher. Each day introduces different technologies and infrastructures that expand the scope of this orchestration:

      • Day 1 introduces concepts, and then uses Taverna to show how workflows can be constructed, so many computational steps can be orchestrated to build a complex analysis.
      • Day 2 pursues the theme of workflow, expanding the scope of orchestration to include web services. The MyGrid project uses Taverna to build workflows from unsecured web services exposed on the internet.
      • Day 3 begins to explore e-science at the UK scale - The National Grid Service is an infrastructure for researchers who collaborate across UK organisations, or who need additional resources above and beyond those accessible in their own institute.
      • Day 4 focuses upon data services on the NGS. It addresses the question: "I know I can access diverse compute resources on the NGS, but how do I manage, share and access data?"
      • Day 5 takes an international view - it outlines the infrastructure available for those who are collaborating in international research.
    • 11:00 11:45
      Introduction to e-science concepts 45m

      "e-science" is enhanced science: research that is carried out by collaborations enabled by the Internet, and using dynamically orchestrated resources, for example of data and compute-power. These resources may be within a University, in institutes within a country, or be international.

    • 11:45 12:00
      break 15m
    • 12:00 13:00
      Case studies introducing successful eScience projects 1h
      Speakers: Georgina Moulton (The University of Manchester) , Dr. Katy Wolstencroft (The University of Manchester, myGrid Team) , Mr. Paul Fisher (The University of Manchester)
    • 13:00 14:00
      Lunch 1h
    • 14:00 14:30
      A Dictionary of Terms for Taverna 30m
      Taverna allows a biologist or bioinformatician with limited computing background and limited technical resources and support to construct highly complex analyses over public and private data and computational resources, all from a standard PC, UNIX box or Apple computer. With a combination of case studies, talks and practicals the concepts of building workflows with Taverna will be presented.
      Speaker: Georgina Moulton (The University of Manchester)
    • 14:30 15:30
      Basic Features of Taverna - A practical

      This first practical will introduce participants to Taverna. At the end, they will be able to load workflows from various locations; build and run a simple workflow; know how to search for web services and add others; and save/export workflows

    • 15:30 15:50
      break 20m
    • 15:50 16:50
      Basic Features in Taverna - A Practical Continued 1h
    • 16:50 17:00
      Discussion 10m
      A chance for participants to discuss the relevance of the NGS services to their own research interests
    • 09:30 09:40
      Introduction to day 2 10m
      Today builds on yesterdays introduction to workflow, and introduces web services that permit distributed processing, before showing how MyGrid uses Taverna to orchestrate web services.
    • 09:40 09:50
      Advanced Features of Taverna 10m
      Speakers: Dr. Georgina Moulton (University of Manchester) , Dr. Katy Wolstencroft (The University of Manchester)
    • 09:50 11:00
      Advanced Features of Taverna - A Practical 1h 10m
    • 11:00 11:20
      break 20m
    • 11:20 12:40
      Advanced Features of Taverna - A Practical Continued 1h 20m
    • 12:40 13:00
      Workflow Issues
    • 13:00 14:00
      Lunch 1h
    • 14:00 15:30
      Building Web Services - A Practical 1h 30m
    • 15:30 15:50
      break 20m
    • 15:50 16:35
      A Taste of the Future - Taverna2 45m
      Speaker: Mr. Stian Soiland (The University of Manchester, myGrid Team)
    • 16:35 17:00
      Final Discussion
    • 09:30 09:40
      Introduction to day 3 10m
      This day explores concepts of grid computing, introduces the UK's National Grid Service and gives experience of some of the alternative ways to run compute jobs on the NGS:
      • The Resource Broker is a grid service that abstracts the compute resources of the NGS: users do not need to identify a specific compute node to run their job; users simply submit a job to the NGS's Resource Broker. A user creates a simple text file that describes a job and its requirements. Using a command-line interface, client software is used to send this file to the resource broker; this then matches the job's requirements to the available compute resources and submits the job to a compute node on the user's behalf.
      • The NGS Applications Repository. This uses an alternative way to describe a job, and holds a library of such descriptions so frequently run jobs can easily be submitted to an NGS resource. This uses a portal interface, accessible from a browser, so no client software is needed by the user.
      • The P-Grade portal brings together the power of grid computing and the elegance of workflow
    • 09:40 10:20
      An Introduction to Grid Computing and the National Grid Service 40m
    • 10:20 11:00
      Gaining access to the NGS 40m
      This talk describes the procedure to be followed to gain access as a user; note it is free to UK academic researchers. It also introduces the UK Certificate Authority and the X.509 certificate, required to use the NGS.
    • 11:00 11:15
      break 15m
    • 11:15 12:30
      The Resource Broker 1h 15m
      practical 1
      practical 2
    • 12:30 13:00
      The NGS Applications Repository 30m
      Introduction to the repository (also known as the "NGS Portal").
    • 13:00 14:00
      Lunch 1h
    • 14:00 14:40
      The NGS Applications Repository - practical 40m
    • 14:40 15:30
      The P-Grade Portal 50m
      Speaker: Tamas Kiss (University of Westminster)
    • 15:30 15:50
      break 20m
    • 15:50 16:30
      P-Grade continued 40m
      Speaker: Tamas Kiss (University of Westminster)
      hands-on in pdf
    • 16:30 16:50
      Case Study - GENIUS 20m
      Grid Enabled Neurosurgical Imaging Using Simulation - a presentation provided by Stefan Zasada of UCL.
    • 16:50 17:00
      End of day discussion 10m
    • 09:30 09:40
      Introduction to day 4 10m
      Today the focus is on data and storage services on the NGS:
      • The Storage Resource Broker: provides a POSIX-like interface to a virtual filesystem, so files can be stored in the SRB and accessed from any NGS node. THis is the recommended way for users and collaborations to hold their files.
      • GridFTP permits files to be efficiently transferred between nodes of the NGS
      • Oracle. The NGS has deployed an Oracle service, and this is available to host researchers' relational data.
      • OGSA-DAI: this toolkit primarily provides a way for data that are not held on grid-specific resources (like the SRB or Oracle NGS service) to be exposed in a controlled way for use in applications that run on a grid. (OGSA= "Open Grid Services Architecture", an architecture for using web services to build grids; DAI="Data Access and Integration"). An OGSA-DAI service is deployed by the owner of the data resource, with installed "activities" which a user can invoke, via a Java client. These activities execute as steps in workflow: they execute close to where data resources are held. These data may be relational, XML or flat-files. OGSA-DAI is an extendable toolkit: new data resource types and new activities can be deployed by the publisher of the data. OGSA-DAI is developed as part of the OMII-UK initiative.
    • 09:40 10:00
      Data and Storage Services on the NGS 20m
    • 10:00 11:00
      Storage Resource Broker 1h
      GridFTP practical
      SRB practical
    • 11:00 11:20
      break 20m
    • 11:20 11:40
      GridFTP 20m
    • 11:40 13:00
      The NGS Oracle Service 1h 20m
      Speakers: Keir Hawker (Rutherford Laboratories) , Simon Collins (University of Manchester)
      Practical 1
    • 13:00 14:00
      Lunch 1h
    • 14:00 14:45
      continued 45m
    • 14:45 15:30
      Grid Data Services using OGSA-DAI 45m
    • 15:30 15:50
      break 20m
    • 15:50 16:20
      OGSA-DAI continued 30m
    • 16:20 17:00
      End of day discussion 40m
    • 09:30 09:40
      Introduction to day 5 10m

      The morning session continues to focus on the NGS. Horizons are then broadened from the UK emphasis of the last two days to look at the EGEE (Enabling Grids for E-sciencE) infrastructure for international e-research. This uses the gLite middleware.

      After lunch horizons are broadened yet further, to survey the "grid islands" that exist - with different grids based on different middleware, isolating resources and researchers on their chosen island. With reference to the WISDOM project, which is using grid computing to address drug discovery for neglected and emergent diseases, the need for interoperability across grids will be explored.

      OMII-Europe is creating standards-based components to permit a collaboration to access resources across different grids.

      The final talk summarises the options open to participants, should they wish to:

      • deploy Taverna
      • join the NGS as a user or as a new project
      • gain experience of the EGEE grid and its gLite middleware
      • evaluate the OMII-Europe components
    • 09:40 10:15
      Computational steering on the NGS 35m
    • 10:15 11:00
      Virtual Organisations and the NGS 45m
      The NGS recently deployed VOMS, the Virtual Organisation Membership System. This talk explains Virtual organisations, VOMS and its implications for NGS users.
    • 11:00 11:20
      break 20m
    • 11:20 12:40
      EGEE: an infrastructure for international e-science 1h 20m

      EGEE is an EU initiative to build international grid infrastructure. This talk describes the status of the EGEE infrastructure, gives an overview of EGEE's gLite middleware, and outlines the bio/health applications that are benefitting from EGEE.

    • 12:40 13:40
      Lunch 1h
    • 13:40 14:40
      OMII-Europe: Bridging grids 1h
    • 14:40 15:10
      Next steps 30m
    • 15:10 15:40
      Closing discussion 30m
    • 15:40 15:55
      Coffee 15m