I. Stokes-Rees (UNIVERSITY OF OXFORD PARTICLE PHYSICS)
The DIRAC system developed for the CERN LHCb experiment is a grid infrastructure for managing generic simulation and analysis jobs. It enables jobs to be distributed across a variety of computing resources, such as PBS, LSF, BQS, Condor, Globus, LCG, and individual workstations. A key challenge of distributed service architectures is that there is no single point of control over all components. DIRAC addresses this via two complementary features: a distributed Information System, and an XMPP (Extensible Messaging and Presence Protocol) Instant Messaging framework. The Information System provides a concept of local and remote information sources. Any information which is not found locally will be fetched from remote sources. This allows a component to define its own state, while fetching the state of other components directly from those components, or via a central Information Service. We will present the architecture, features, and performance of this system. XMPP has provided DIRAC with numerous advantages. As an authenticated, robust,lightweight, and scalable asynchronous message passing system, XMPP is used, in addition to XML-RPC, for inter- Service communication, making DIRAC very fault-tolerant, a critical feature when using Service Oriented Architectures. XMPP is also used for monitoring real-time behaviour of the various DIRAC components. Finally, XMPP provides XML-RPC like facilities which are being developed to provide control channels direct to Services, Agents, and Jobs. We will describe our novel use of Instant Messaging in DIRAC and discuss directions for the future.