Speaker
V. garonne
(CPPM-IN2P3 MARSEILLE)
Description
The Workload Management System (WMS) is the core component of the
DIRAC distributed MC production and analysis grid of the LHCb
experiment. It uses a central Task database which is accessed via
a set of central Services with Agents running on each of the LHCb
sites. DIRAC uses a 'pull' paradigm where Agents request tasks
whenever they detect their local resources are available.
The collaborating central Services allow new components to be
plugged in easily. These Services can perform functions such as
scheduling optimization, task prioritization, job splitting and merging,
to name a few. They provide also job status information for various
monitoring clients. We will discuss the services deployment and operation
with particular emphasis on the robustness and scalability issues.
The distributed Agents have modular design which allows easy functionality
extensions to adapt to the needs of a particular site. The Agent
installation have only basic pre-requisites which makes it easy for new
sites to be incorporated. An Agent can be deployed on a gatekkeeper of a
large cluster or just on a single worker node of the LCG grid. PBS,LSF,BQS,
Condor,LCG,Globus can be used as the DIRAC computing resources.
The WMS components use XML-RPC and instant messaging Jabber protocols
for communication which increases the overall reliability of the
system. The jobs handled by the WMS are described using Classad library
which facilitates the interoperability with other grids.