The LHCb experiment relies on LHCbDIRAC, an extension of DIRAC, to drive its offline computing. This middleware provides a development framework and a complete set of components for building distributed computing systems. These components are currently installed and ran on virtual machines (VM) or bare metal hardware. Due to the increased load of work, high availability is becoming more and more important for the LHCbDIRAC services, and the current installation model is showing its limitations.
Apache Mesos is a cluster manager which aims at abstracting heterogeneous physical resources on which various tasks can be distributed thanks to so called "framework". The Marathon framework is suitable for long running tasks such as the DIRAC services, while the Chronos framework meets the needs of cron-like tasks like the DIRAC agents. A combination of the service discovery tool Consul together with HAProxy allows to expose the running containers to the outside world while hiding their dynamic placements.
Such an architecture would bring a greater flexibility in the deployment of LHCbDirac services,allowing for easier deployment maintenance and scaling of services on demand (e..g LHCbDirac relies on 138 services and 116 agents). Higher reliability would also be easier, as clustering is part of the toolset, which allows constraints on the location of the services.
This paper describes the investigations carried out to package the LHCbDIRAC and DIRAC components into Docker containers and orchestrate them using the previously described set of tools.
|Primary Keyword (Mandatory)||Computing facilities|
|Secondary Keyword (Optional)||Computing middleware|
|Tertiary Keyword (Optional)||Cloud technologies|