An integrated monitoring system for Grid Data Centers
Presented by Prof. Guido RUSSO on 12 Apr 2010 from 17:51 to 17:54
Session: Poster session
Track: Software services exploiting and/or extending grid middleware (gLite, ARC, UNICORE etc)
We present a monitoring system developed for the Data Centers used for the SCoPE and ATLAS projects in Napoli, Italy. The system is based on a portlet container which gives an integrated view of the Data Center, and allows a graphical-based, hiearchically organized navigation for all the equipments, from the racks to the active components. The system allows monitoring of the whole infrastructure (UPS, cooling, electrical power consumption at single socket level), but also of the network (Gigabit, 10 Gigabit, Infiniband, Fibre Channel) and, of course, of storage and server nodes.
We are approaching the integration of applications for monitoring, realizing a portal-integration (PI) and an enterprise-information-integration (EII). The work we have done has followed this path: (i) identification of the functionalities that need to be monitored, along with their characteristics; (ii) identification of the already available commercial or self-made software products, if any; (iii) identification of the available open–source software; and (iv) development of dedicated plug -in and java web – applications. The work has been carried out because of the need of having a dedicated troubleshooting and monitoring system for the newly built Data Center (we opened on february 2009), for which we needed a robust, flexible and easily adaptable system, without the complexity of functionalities often present in commercial systems but of rare usage, if any. But soon the job moved to the construction of a true portal system for the Data Center, which shall soon become the unique point-of-entry for all kinds of access, from the novice user, the expert user, the management team, and the referee group which has its role in future funding.
The system we realized represents an integrated system, giving to most users a unique approach in accessing the grid Data Center; in particular, for the management team, it allows a thorough integration of all monitoring subsystems, and thus able to easily accommodate new hardware and software tools, thus following the evolution of the Data Center. For first-level monitoring, the approach we followed guarantees a simplified view of the infrastructure, so that operators shifts do not need very qualified personnel. We realized our own modules for a lot of equipment, but we also integrated existing applications such as GRIDICE – GANGLIA – POWER FARM. Such an integration strategy is in our opinion very useful in a situation, like the one in EGEE, in which a single tool is not enough for an exhaustive management of all the aspects of the Data Center.
More modules and more functionalities are being implemented, as well as an optimization of existing ones, initially integrated off-the-shelf, e.g. by using struts as a development framework for java web application. As an example of a tool which we are implementing, we are actively working on a tool for automatic emergency shutdown and restart, and for programmed shutdown and restart for maintenance downtimes, on a rack basis.
monitoring - grid - data center
Location: Uppsala University
- Prof. Guido RUSSO Universita' Federico II & INFN, Napoli
- Dr. Giovanni Battista BARONE Universita' Federico II, Napoli
- Dr. Vania BOCCIA Universita' Federico II, Napoli
- Dr. Gianpaolo CARLINO INFN, Napoli