Monitoring system for the Belle II distributed computing

10 Jul 2018, 11:45
The Belle II is an asymmetric energy e+e- collider experiment at KEK, Japan. The Belle II aims to reveal the physics beyond the standard model with a data set of about 5×10^10 BB^bar pairs and starts the physics run in 2018. In order to store such a huge amount of data including simulation events and analyze it in a timely manner, Belle II adopts a distributed computing model with DIRAC (Distributed Infrastructure with Remote Agent Control).

The monitoring system for Belle II computing is developed as an extension of DIRAC. It collects and investigates various information related to the job processing or data transfer stored in DIRAC database, and then visualizes them. We also develop a system which regularly performs several accessibility tests to the various components such as computing/storage elements, database storing calibration information, and DIRAC servers. Finally, the system makes a total health check of the Belle II computing with all of the investigation and test results. Detected issues are displayed in single place so that even non-expert shifter can easily find the problem.

In this contribution, we will present the details of the monitoring system as well as experience during the simulation data production campaign and a first few months of the data taking.

Kiyoshi Hayasaka Takanori Hara (High Energy Accelerator Research Organization (JP)) I Ueda (KEK IPNS) Hideki Miyake (KEK)

