CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Next Generation Monitoring

13 Oct 2016, 15:30

1h 15m

San Francisco Marriott Marquis

Poster Track 7: Middleware, Monitoring and Accounting Posters B / Break

Robert Fay (University of Liverpool (GB))

Monitoring of IT infrastructure and services is essential to maximize availability and minimize disruption, by detecting failures and developing issues to allow rapid intervention.

The HEP group at Liverpool have been working on a project to modernize local monitoring infrastructure (previously provided using Nagios and ganglia) with the goal of increasing coverage, improving visualization capabilities, and streamlining configuration and maintenance. Here we discuss some of the tools evaluated, the different approaches they take, and how they can be combined to complement each other to form a comprehensive monitoring infrastructure. An overview of the resulting system and progress on implementation to date will be presented, which is currently as follows:

The system is configured with Puppet. Basic system checks are configured in Puppet using Hiera, and managed by Sensu. Centralised logging is managed with Elasticsearch, together with Logstash and Filebeat. Kibana provides an interface for interactive analysis, including visualization and dashboards. Metric collection is also configured in Puppet, with ganglia, Sensu, riemann.io, and collectd amongst the tools being considered. Metrics are sent to Graphite, with Grafana providing a visualization and dashboard tool. Additional checks on the collated logs and on metric trends are also configured in Puppet and managed by Sensu.

The Uchiwa dashboard for Sensu provides a web interface for viewing infrastructure status. Alert capabilities are provided via external handlers. Liverpool are developing a custom handler to provide an easily configurable, extensible and maintainable alert facility.

Primary Keyword (Mandatory)	Monitoring
Secondary Keyword (Optional)	Computing facilities

John Bland (University of Liverpool) Robert Fay (University of Liverpool (GB))

Stephen Jones (Liverpool University)

Highlights-75.pdf

Poster-75.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Next Generation Monitoring

San Francisco Marriott Marquis

Speaker

Description

Authors

Co-author

Presentation materials