21-25 May 2012
New York City, NY, USA
Hunting for hardware changes in data centers.

22 May 2012, 13:30
Miguel Coelho Dos Santos (CERN)


With many servers and server parts the environment of warehouse sized data centers is increasingly complex. Server life-cycle management and hardware failures are responsible for frequent changes that need to be managed. To manage these changes better a project codenamed "hardware hound" focusing on hardware failure trending and hardware inventory has been started at CERN. By creating and using a hardware oriented data set - the inventory - with detailed information on servers and their parts, firmware levels, and other server related data, e.g. rack location, benchmarked processing performance and power consumption, warranty coverage, purchase order, deployment state (production, maintenance), etc; as well as tracking changes to this inventory, the project aims at, for example, being able to discover trends in hardware failure rates, e.g. lower mean time to failure of a given component in a given batch of servers. This contribution will describe the architecture of the project, the inventory data, and real life use cases.

