Indico has been upgraded to version 3.1. Details in the SSB
Nov 4 – 8, 2019
Adelaide Convention Centre
Australia/Adelaide timezone

Big data solutions for CMS computing monitoring and analytics

Nov 5, 2019, 2:15 PM
Riverbank R3 (Adelaide Convention Centre)

Riverbank R3

Adelaide Convention Centre

Oral Track 3 – Middleware and Distributed Computing Track 3 – Middleware and Distributed Computing


Federica Legger (Universita e INFN Torino (IT))


The CMS computing infrastructure is composed by several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and logging information was scattered in several sources and typically accessible only by experts. In the last year CMS computing fostered the adoption of common big data solutions based on open-source, scalable, and no-SQL tools, such as Hadoop, InfluxDB, and ElasticSearch, available through the CERN IT infrastructure. Such system allows for the easy deployment of monitoring and accounting applications using visualisation tools such as Kibana and Graphana. Alarms can be raised when anomalous conditions in the monitoring data are met, and the relevant teams are automatically notified. Data sources from different subsystems are used to build complex workflows and predictive analytics (data popularity, smart caching, transfer latency, …), and for performance studies. We describe the full software architecture and data flow, the CMS computing data sources and monitoring applications, and show how the stored data can be used to gain insights into the various subsystems by exploiting scalable solutions based on Spark.

Consider for promotion Yes

Primary author

Federica Legger (Universita e INFN Torino (IT))

Presentation materials