Speaker
Andrew McNab
(University of Manchester (GB))
Description
The LHCb experiment has been running production jobs in virtual machines since 2013 as part of its DIRAC-based infrastructure. We describe the architecture of these virtual machines and the steps taken to replicate the WLCG worker node environment expected by user and production jobs. This relies on the CernVM 3 system for providing root images for virtual machines. We use the cvmfs distributed filesystem to supply the root partition files, the LHCb software stack, and the bootstrapping scripts necessary to configure the virtual machines for us. Using this approach, we have been able to minimise the amount of contextualisation which must be provided by the virtual machine managers. We explain the process by which the virtual machine is able to receive payload jobs submitted to DIRAC by users and production managers, and how this differs from payloads executed within conventional DIRAC pilot jobs on batch queue based sites. We compare our operational experiences of running production on VM based sites managed using OpenStack, Vac, BOINC, and Condor. Finally we describe our requirements for monitoring which are specific to the additional responsibilities for experiments when operating virtual machines which were previously undertaken by the system managers of worker nodes, and how this is facilitated by the new DIRAC Pilot 2.0 architecture.
Primary author
Andrew McNab
(University of Manchester (GB))
Co-authors
Cinzia Luzzi
(CERN)
Federico Stagni
(CERN)