Speaker
Janos Daniel Pek
(CERN)
Description
The CERN Batch System is comprised of 4000 worker nodes. 60 queues offer a service for various types of large user communities. In light of the recent developments driven by the Agile Infrastructure and the more demanding processing requirements, the Batch System will be faced with increasingly challenging scalability and flexibility needs. Last year the CERN Batch Team has started to evaluate three candidate batch systems: SLURM, HTCondor and GridEngine. This year as we are reaching a conclusion, one of our candidates is HTCondor. In this talk we give a short reminder of our requirements and our preliminary results from last year. Then we'll focus on HTCondor, our experience with it thus far, our testing framework and the results of our performance tests. Finally, we give a summary of the foreseeable challenges we would have to face if we decide to migrate the CERN Batch Service to Condor.
Author
Janos Daniel Pek
(CERN)