Apr 18 – 22, 2016
DESY Zeuthen
Europe/Berlin timezone

Monitoring at scale: a needle in the haystack

Apr 21, 2016, 5:00 PM
Seminar room 3 (DESY Zeuthen)

Seminar room 3

DESY Zeuthen

Platanenallee 6, 15738 Zeuthen (near Berlin), Germany
Basic IT Services Basic IT services


Fabien Wernli (CCIN2P3)


Many of today's opensource monitoring tools have grown to distributed, horizontally scaling solutions. When designing a new infrastructure, choosing and configuring the right software stack to analyze and record logs and metrics can admittedly still be a challenge, but we are no longer restricted to the vertically scaling rrdtool-type timeseries storage. The real challenge is the amount of data a monitored system can produce, and the difficulty to process it without classifying and tagging it appropriately. We explain the necessity to attach relevant metadata to monitored events in order to offer a solution to the needle-in-the-haystack problem that affects large datasets. Through practical ideas and use-cases at CCIN2P3 we underline the capital importrance of metadata to leverage the power to query the underlying indexing backend. Aggregating metrics and querying logs against technical or business-oriented key-value pairs are a powerful way to answer questions, and provide high-level alerts. We present the current solution of managing log events at CCIN2P3 as well as the upcoming metric solution. The primary focus is on the tool stack and experience we gathered during the last decade. The current monitoring stack is based on facter (puppet), collectd, riemann, syslog-ng and elasticsearch and has successfully been used in production at CCIN2P3 on its 2 datacenters with roughly 1500 monitored nodes and 12'000 events per second on average.
Length of presentation (minutes, max. 20) 45

Primary author

Fabien Wernli (CCIN2P3)

Presentation materials