Describe the scientific/technical community and the scientific/technical activity using (planning to use) the EGEE infrastructure. A high-level description is needed (neither a detailed specialist report nor a list of references).
Efficient computer-based drug discovery for neglected and emergent disease is based on the molecular docking simulation where the potential drug candidates are selected from a huge amount of chemical compounds (~ 10^6). The previous grid challenge preparing for fighting avian flu mutations has demonstrated that the biomedical communities can benefit from the EGEE infrastructure in terms of the speed and the reaction time of screening over a full spectrum of the compound libraries.
Report on the experience (or the proposed activity). It would be very important to mention key services which are essential for the success of your activity on the EGEE infrastructure.
The system uses DIANE to distribute docking simulations on the grid. DIANE features an agent-based task pulling model with high-level failure recovery mechanism to ensure a steady job throughput. The system could also utilize as many as available grid resources by running multiple DIANE instances.
The distributed DIANE instances are organized by a Virtual Queuing System, part of the Grid Application Platform developed by ASGC. Through it, users can manage the distributed DIANE instances as controlling jobs in a job queuing system.
Essential information from the simulation results are stored in the AMGA catalogue system as the metadata. Aggregative data analysis could be done easily by AMGA queries rather than looking into the results widely distributed on the grid storage elements.
The thin client consists of a set of Java APIs and a command shell. It can be launched in any Java-enabled desktop environment, providing an opportunity of integratigring the grid with desktop utilities.
Describe the added value of the Grid for the scientific/technical activity you (plan to) do on the Grid. This should include the scale of the activity and of the potential user community and the relevance for other scientific or business applications
The grid is an ideal environment providing “on-demand” resources for the docking simulation. The objective of our work is to deliver a productive system enabling biologists to run docking simulations and to manage the docking results on the grid as simple as using a desktop utility in the daily research.
Based on our experience in running previous computational challenge, the system has integrated several existing technologies in order to improve the usability, scalability and stability of running molecular docking simulations on the grid.
During the second grid challenge started in this August for avian flu drug analysis, the system has been used by biologists to run large scale docking simulations on the EGEE infrastructure. Within the same system, biologists also performed a first-level analysis on the distributed results for planning a refinement simulation. A significant part of the simulation works was done by non-grid experts on Windows desktop.
Abstracts for online demonstrations must provide a summary of the demo content. Places for demos are limited and this summary will be used as part of the selection procedure. Please include the visual impact of the demo and highlight any specific requirements (e.g. network connection). In general, a successful demo is expected to have some supporting material (poster) and be capable of running on a single screen or projector.
The demo will show a real production system used by biology users to run docking simulations on the EGEE infrastructure.
In the demo, a real-life job containing about 1000 docking simulations (~6000 minutes CPU time) will be prepared and submitted from the thin client running on a laptop. The job preparation is based on answering few application-level questions. Few hundred worker nodes on the EGEE infrastructure will be requested to process the simulations.
When the job is running, a real-time progress monitor will show a steady job throughput even under the condition that unexpected errors may occur on some grid worker node. We will also demonstrate how user can visualize the results produced by the completed simulations and perform an aggregate analysis among the simulations (e.g. an binding-energy histogram).
To have better demo performance, a wired internet connection would be preferred.