Speaker
Description
The ATLAS Trigger and Data Acquisition (TDAQ) is a large, distributed
system composed of several thousands interconnected computers and tens
of thousands software processes (applications). Applications produce a
large amount of operational messages (at the order of O(10^4) messages
per second), which need to be reliably stored and delivered to TDAQ
operators in a realtime manner, and also be available for post-mortem
analysis by experts.
We have selected SPLUNK, a commercial solution by Splunk Inc, as a
all-in-one solution for storing different types of operational data in
an indexed database, and a web-based framework for searching and
presenting the indexed data and for rapid development of user-oriented
dashboards accessible in a web browser.
The paper describes capabilities of Splunk framework, use cases,
applications and web dashboards developed for facilitating the
browsing and searching of TDAQ operational data by TDAQ operators and
experts.