Speaker
Description
RAL Tier-1 originally used the PBS batch system for its Grid related activities. Increased LHC operation requirements exposed scalability problems, therefore other batch systems were taken into consideration.
In this presentation we review the history of HTCondor at RAL and detail on how it evolved from an initial conventional setup with cgroups for resource control to current use of Docker containers that presents its own set of challenges.
We describe the integration of the batch farm with the Ceph storage system by means of dedicated Docker containers, and we discuss our experience with jobs bursting into the RAL cloud.
The presentation also comprises our consolidation plans, including future needs, especially ensuring a sustained number of multicore jobs on the batch farm.