ALICE HLT Cluster operation during ALICE Run 2
(Johannes Lehrbach) for the ALICE collaboration
ALICE (A Large Ion Collider Experiment) is one of the four major detectors located at the LHC at CERN, focusing on the study of heavy-ion collisions. The ALICE High Level Trigger (HLT) is a compute cluster which reconstructs the events and compresses the data in real-time. The data compression by the HLT is a vital part of data taking especially during the heavy-ion runs in order to be able to store the data which implies that reliability of the whole cluster is an important matter.
To guarantee a consistent state among all compute nodes of the HLT cluster we have automatized the operation as much as possible. For automatic deployment of the nodes we use Foreman with locally mirrored repositories and for configuration management of the nodes we use Puppet. Important parameters like temperatures of the nodes are monitored with Zabbix.
During periods without beam the HLT cluster is used for tests and as one of the WLCG Grid sites to compute offline jobs in order maximize the usage of our cluster. To prevent interference with normal HLT operations we introduced a separation via virtual LANs between the normal HLT operation and the grid jobs running inside virtual machines.
|Secondary Keyword (Optional)||High performance computing|
|Primary Keyword (Mandatory)||Computing facilities|