4-7 September 2018
RAL
Europe/London timezone

Day-to-day HTCondor Operations at RAL

6 Sep 2018, 10:10
25m
CR12, R68 (RAL)

CR12, R68

RAL

Science and Technology Facilities Council Rutherford Appleton Laboratory Harwell Campus Didcot OX11 0QX United Kingdom Tel: +44 (0)1235 445 000 Fax: +44 (0)1235 445 808 N 51° 34' 27.6" W 1° 18' 52.6" (51.57433,-1.31462)
HTCondor presentations and tutorials Workshop presentations

Speaker

John Kelly (S)

Description

RAL Tier-1 originally used the PBS batch system for its Grid related activities. Increased LHC operation requirements exposed scalability problems, therefore other batch systems were taken into consideration.

In this presentation we review the history of HTCondor at RAL and detail on how it evolved from an initial conventional setup with cgroups for resource control to current use of Docker containers that presents its own set of challenges.

We describe the integration of the batch farm with the Ceph storage system by means of dedicated Docker containers, and we discuss our experience with jobs bursting into the RAL cloud.

The presentation also comprises our consolidation plans, including future needs, especially ensuring a sustained number of multicore jobs on the batch farm.

Primary author

Co-author

Catalin Condurache (Science and Technology Facilities Council STFC (GB))

Presentation materials