-
Mr Wojciech Lapka (Unknown)28/10/2009, 10:30Monitoring Infrastructure and ToolsSince 2005 Worldwide LHC Computing Grid (WLCG) services have been monitored by the Service Availability Monitoring (SAM) system which has been the main source of information for the monthly WLCG availability and reliability calculations. During this time SAM framework gained popularity amongst site and service managers and was very useful in building robust grid infrastructure. Experience...Go to contribution page
-
Mr Thomas Davis (NERSC/LBNL)28/10/2009, 11:00Monitoring Infrastructure and ToolsWe present a method of monitoring the environment and performance using open source tools such as Nagios, Ganglia and Cacti to collect and display performance data as well as availability information for various components of large computing systems in an integrated fashion. We will present information on how the data is collected, viewed and analyzed, with specific examples from NERSC's Cray system.Go to contribution page
-
Frédéric AZEVEDO (CC-IN2P3)28/10/2009, 11:30Due to the continuous load and intensive usage on our robotics, we regularly face some hardware issues with tapes and tape drives. A recurrent issue concerns possible data loss which leads to go through a long recovery process. In order to improve our reliability, we have studied commercial solutions to avoid permanent write/read errors, or at least foresee occurring errors. We've tested...Go to contribution page
Choose timezone
Your profile timezone: