27 September 2004 to 1 October 2004
Interlaken, Switzerland
Europe/Zurich timezone

DIRAC Workload Management System

29 Sept 2004, 10:00
1h
Coffee (Interlaken, Switzerland)

Coffee

Interlaken, Switzerland

Board: 25
poster Track 5 - Distributed Computing Systems and Experiences Poster Session 2

Speaker

V. garonne (CPPM-IN2P3 MARSEILLE)

Description

The Workload Management System (WMS) is the core component of the DIRAC distributed MC production and analysis grid of the LHCb experiment. It uses a central Task database which is accessed via a set of central Services with Agents running on each of the LHCb sites. DIRAC uses a 'pull' paradigm where Agents request tasks whenever they detect their local resources are available. The collaborating central Services allow new components to be plugged in easily. These Services can perform functions such as scheduling optimization, task prioritization, job splitting and merging, to name a few. They provide also job status information for various monitoring clients. We will discuss the services deployment and operation with particular emphasis on the robustness and scalability issues. The distributed Agents have modular design which allows easy functionality extensions to adapt to the needs of a particular site. The Agent installation have only basic pre-requisites which makes it easy for new sites to be incorporated. An Agent can be deployed on a gatekkeeper of a large cluster or just on a single worker node of the LCG grid. PBS,LSF,BQS, Condor,LCG,Globus can be used as the DIRAC computing resources. The WMS components use XML-RPC and instant messaging Jabber protocols for communication which increases the overall reliability of the system. The jobs handled by the WMS are described using Classad library which facilitates the interoperability with other grids.

Authors

A. Tsaregorodtsev (CPPM-IN2P3 MARSEILLE) I. Stokes-Rees (oxford) V. garonne (CPPM-IN2P3 MARSEILLE)

Presentation materials