Speaker
Mr
Jakub MOSCICKI
(CERN)
Description
The major GRID infastructures are designed mainly for batch-oriented
computing with coarse-grained jobs and relatively high job turnaround
time. However many practical applications in natural and physical
sciences may be easily parallelized and run as a set of smaller tasks
which require little or no synchronization and which may be scheduled in
a more efficient way. The Distributed Analysis Environment Framework
(DIANE), is a Master-Worker execution skeleton for applications, which
complements the GRID middleware stack. Automatic failure recovery and
task dispatching policies enable an easy customization of the behaviour
of the framework in a dynamic and non-reliable computing environment. We
demonstrate the experience of using the framework with several diverse
real-life applications, including Monte Carlo Simulation, Physics
Data Analysis and Biotechnology.
The interfacing of existing sequential applications from the point of
view of non-expert user is made easy, also for legacy applications. We
analyze the runtime efficiency and load balancing of the parallel tasks
in various configurations and diverse computing environments: GRIDs (LCG, Crossgrid),
batch farms and dedicated clusters. In practice, the usage of ther
Master/Worker layer allows to dramatically reduce the job turnaround
time, a scenario suitable for short deadline jobs and interactive data
analysis.
Finally it is also possible to easily introduce more complex
synchronization patterns, beyond trivial parallelism, such as arbitrary
dependency graphs (including cycles, in contrast to DAGs) which may be
suitable for bio-informatics applications.
Author
Mr
Jakub MOSCICKI
(CERN)