9–11 May 2007
Manchester, United Kingdom
Europe/Zurich timezone

WISDOM production environment

9 May 2007, 17:30
2h 30m
Manchester, United Kingdom

Manchester, United Kingdom

Board: D-016
demo presentation On-line Demonstrations Poster and Demo Session

Speakers

Mr Jean Salzemann (IN2P3/CNRS)Mr Vincent Bloch (IN2P3/CNRS)Mr Vinod Kasam (IN2P3/CNRS)

With a forward look to future evolution, discuss the issues you have encountered (or that you expect) in using the EGEE infrastructure. Wherever possible, point out the experience limitations (both in terms of existing services or missing functionality)

The grid is still a very unpredictable system and there are many
single points of
failure that makes the efficiency decrease as the grid become
more and more
overloaded. For instance, resource brokers are still the source
of a lot of troubles
because they can become easily overloaded by job submissions, and
are sometimes
inefficient in the scheduling of the jobs.

Describe the scientific/technical community and the scientific/technical activity using (planning to use) the EGEE infrastructure. A high-level description is needed (neither a detailed specialist report nor a list of references).

During 2005 and 2006, three biomedical data challenges were run
on the EGEE grid: two
on malaria and one on avian flu. These deployments, based on
relevant biological
needs, were successfully achieved using most of the available
resources on the Biomed
virtual organisation. As a total, almost 600 years of
computations were achieved
during these 3 deployments using the WISDOM production environment.

Describe the added value of the Grid for the scientific/technical activity you (plan to) do on the Grid. This should include the scale of the activity and of the potential user community and the relevance for other scientific or business applications

With WISDOM, we wanted to produce a straightforward application,
“easy” to use for
non grid experts, and being able to integrate any type of docking
software with it.
The system was designed to deploy high-throughput experiments on
the grid, and is
being reengineered to offer a fully interoperable web services
interface, with
connections to databases to store and query, in almost real-time,
the statistics and
results.
One of the major added values of this new architecture is that
the whole system can
be easily integrated in workflow engines that just call the
ad-hoc operations. The
developments were focused on fault-tolerance, flexibility and
scalability but several
issues arose during the experiments.

Report on the experience (or the proposed activity). It would be very important to mention key services which are essential for the success of your activity on the EGEE infrastructure.

The environment is made of a set of scripts that generate the
jobs, submit the files
and check regularly their status while they are on the Grid
through the workload
management system. Given the status of the jobs, several actions
are taken. The main
one is the job resubmission if a job has been aborted by the
workload management
system, cancelled by the user or has failed because problems
occurred during job run.
The environment can also cancel a job automatically if the job
stayed for too long in
a queue; in this case the job is also resubmitted after it has
been cancelled.
The submission process is performed by a java multithreaded
submission engine, that
can submit multiple jobs on several resource brokers in a round
robin. The job
results are stored directly on the grid storage elements, and the
useful scoring
information are registered directly in a relational database.

Author

Mr Jean Salzemann (IN2P3/CNRS)

Co-authors

Mr Vincent Bloch (IN2P3/CNRS) Mr Vinod Kasam (IN2P3/CNRS)

Presentation materials

There are no materials yet.