9–11 May 2007
Manchester, United Kingdom
Europe/Zurich timezone

Structural Biology in the context of EGEE

10 May 2007, 14:00
20m
Manchester, United Kingdom

Manchester, United Kingdom

Speaker

Mr Germán Carrera (CNB/CSIC)

Report on the experience (or the proposed activity). It would be very important to mention key services which are essential for the success of your activity on the EGEE infrastructure.

We have adapted our Structural Biology applications for production use over EGEE with
the help of the DIANE framework for resource and job management.
To spread knowledge of our solution, CNB is organizing a seminar with wet-lab users
(Structural Biology researchers) and developers where we will introduce it and
collect their response and feedback to its implementation.
We think that the success of our activity within VO Biomed and NA4 depends on:
- Production level quality of the services running on the Grid: for this reason we
need an efficient port of our applications to EGEE and a correct adaptation of the
software to this environment. Interaction with the EGEE infrastructure cannot pose
any added handicap to users, as this would dissuade potential users from using EGEE.
- We must rely in the availability of the resources of EGEE. Dealing with
infrastructure problems external to the code being ported add a heavy burden on
application developers.

Describe the added value of the Grid for the scientific/technical activity you (plan to) do on the Grid. This should include the scale of the activity and of the potential user community and the relevance for other scientific or business applications

There are various steps in the 3D-EM refinement process that may benefit from Grid
computing. To start with, large numbers of experimental images need to be averaged.
Nowadays, typically tens of thousands of images are used, while future studies may
routinely employ millions of images.
Our group has been developing Xmipp, a package for single-particle 3D-EM image
processing. Using Xmipp, the classification of 91,000 ribosome projections into 4
classes took more than 2500 CPU hours using the resources of the MareNostrum
supercomputer at the Barcelona Supercomputing Center. As few groups will have access
to such resources, we propose to use the EGEE infrastructure for Xmipp (ML2D/ML3D),
in collaboration with the Network of Excellence in 3D-EM. Enabling widespread
adoption of 3D-EM will have a long-term profound impact in our understanding complex
biological structures (such as viruses, organelles and macromolecular assemblies) to
exploit their biomedical applications.

Describe the scientific/technical community and the scientific/technical activity using (planning to use) the EGEE infrastructure. A high-level description is needed (neither a detailed specialist report nor a list of references).

Electron microscopy (EM) is a crucial technique which allows Structural Biology
researchers to characterize macromolecular assemblies in distinct functional states.
Image processing in three dimensional EM (3D-EM) is used by a flourishing community
(exemplarized by the EU funded 3D-EM NoE) and is characterized by voluminous data and
large computing requirements, making this a problem well suited for Grid computing
and the EGEE infrastructure.

With a forward look to future evolution, discuss the issues you have encountered (or that you expect) in using the EGEE infrastructure. Wherever possible, point out the experience limitations (both in terms of existing services or missing functionality)

Currently, our main concerns with support on the EGEE infrastructure are:
- Better and widespread support for parallel processing (MPI) is needed to improve
response time in our applications.
- Data management needs improvements for usability and transparency (e.g. a Grid file
system like ELFI).
- EGEE needs to go past its current transition to gLite: we still need to use LCG
commands to interact with the Information system to avoid gLite shortcomings
detecting resource availability.

Authors

Mr David García (CNB/CSIC) Mr Germán Carrera (CNB/CSIC) Dr Jose María Carazo (CNB/CSIC) Dr José Ramón Valverde (CNB/CSIC)

Co-authors

Mr Adrian Muraru (CERN IT/PSS) Mr Jakub Moscicki (CERN IT/PSS)

Presentation materials