Evaluation of NoSQL databases for DIRAC monitoring and beyond

13 Apr 2015, 14:45
15m
C209 (C209)

C209

C209

oral presentation Track3: Data store and access Track 3 Session

Speaker

Federico Stagni (CERN)

Description

Nowadays, many database systems are available but they may not be optimized for storing time series data. The DIRAC job monitoring is a typical use case of such time series. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated. Choosing an appropriate database for storing huge amounts of time series is not trivial as one must take into account different aspects such manageability, scalability, extensibility etc. We compared the performance of Elasticsearch, OpenTSDB that is based on HBase and InfluxDB time series NoSQL databases using the same set of machines and the same data. We also evaluated the effort required for maintaining them. Using the LHCb Workload Management System, based on DIRAC, as a use case we have setup a new monitoring system in parallel with the current MySQL system and we publish the same data into the databases under test. We have evaluated Grafana (for OpenTSDB) and Kibana (for ElasticSearch) metrics and graph editors for creating dashboards in order to have clear picture on the usability of each candidate. In this paper we present the result of this study and the performance of the selected technology. We also give an outlook of other potential applications of NoSQL databases with DIRAC project.

Primary author

Co-authors

Adrian Casajus Ramo (University of Barcelona (ES)) Federico Stagni (CERN) Luca Tomassetti (University of Ferrara and INFN)

Presentation Materials