Sep 21 – 25, 2020
(teleconference only)
Europe/Paris timezone

Challenge of the Migration of the RP-Coflu-Cluster @ CERN

Sep 23, 2020, 5:40 PM
20m
https://cern.zoom.us/j/97987309455

https://cern.zoom.us/j/97987309455

HTCondor user presentations Workshop session

Speaker

Xavier Eric Ouvrard (CERN)

Description

The Coflu Cluster, also known as the Radio-Protection (RP) Cluster, started as an experimental project at CERN involving a few standard desktop computers, in 2007. It was envisaged to have a job scheduling system and a common storage space so that multiple Fluka simulations could be run in parallel and monitored, utilizing a custom built and easy-to-use web-interface.

Abstract The infrastructure is composed of approximately 500 cores, and relies on HTCondor as an open-source high-throughput computing software framework for the execution of Fluka simulation jobs. Before the migration that was carried out over these last three months, nodes where running under Scientific Linux 6 and HT Condor mostly in the latest HT Condor 7 version. The web interface—based on JavaScript and PHP—allowing job submission was relying intensively on the Quill database hosted in CERN's “database on demand” infrastructure.

Abstract In this talk, we discuss the challenges of migrating HTCondor to its latest version on our infrastructure, which required solving different challenges: replacing the Quill database used intensively in the web interface for supporting the submission and management of jobs, updating a whole system with the least interruption of the production, by gradually migrating its components to both the latest version of HT Condor and Centos 7.

Abstract We then terminate this presentation by the project of migrating this infrastructure to the CERN HT Condor pool.

Speaker release Yes

Primary authors

Presentation materials