Together with the start of LHC, high-energy physics researchers will start massive usage of LHC Tier2s. It is essential to supply physics user groups with a simple and intuitive “user-level” summary of their associated T2 services’ status, showing for example available, busy and unavailable resources. At the same time, site administrators need “technical level” monitoring, namely a view of parameters and details about services, statistics, also with event notification, in order to guarantee full service availability and reliability. Classic cluster monitoring tools cover only partially the T2 needs. The development of supplementary tools (basically a bunch of Bash cronjobs) to control our farm has led, day by day, to an out-and-out new monitoring infrastructure (Mon2), providing both technical views (RAID systems, network, temperature, operating system parameters, etc.) and user-level views (central availability tests, job monitoring…). A central server collects information sent by hosts, publishes it on the Web and via RSS feed and sends configured alarms via e-mail and SMS; interesting parameters can be stored locally into a flat-file transactional SQL database engine to generate plots and help troubleshooting and forensics. Security and easy management are achieved by using public key authentication for data exchange among hosts, using pure HTML on the Web interface, and using no DB servers. Exclusive use of Perl and Bash scripting assure the possibility for site administrators to customize sensors, accounting and e-mail notifications on their own site.
