The ATLAS Trigger and Data Acquisition (TDAQ) is a large, distributed
system composed of several thousands interconnected computers and tens
of thousands software processes (applications). Applications produce a
large amount of operational messages (at the order of O(10^4) messages
per second), which need to be reliably stored and delivered to TDAQ
operators in a realtime manner, and also be available for post-mortem
analysis by experts.
We have selected SPLUNK, a commercial solution by Splunk Inc, as a
all-in-one solution for storing different types of operational data in
an indexed database, and a web-based framework for searching and
presenting the indexed data and for rapid development of user-oriented
dashboards accessible in a web browser.
The paper describes capabilities of Splunk framework, use cases,
applications and web dashboards developed for facilitating the
browsing and searching of TDAQ operational data by TDAQ operators and