G. Cancio (CERN)
This paper describes the evolution of fabric management at CERN's T0/T1 Computing Center, from the selection and adoption of prototypes produced by the European DataGrid (EDG) project to enhancements made to them. In the last year of the EDG project, developers and service managers have been working to understand and solve operational and scalability issues. CERN has adopted and strengthened Quattor, EDG's installation and configuration management toolsuite, for managing all Linux clusters and servers in the Computing Center, replacing existing legacy management systems. Enhancements to the original prototype include a redundant and scalable server architecture using proxy technology and producing plug-in components for configuring system and LHC computing services. CERN now coordinates the maintenance of Quattor, making it available to other sites. Lemon, the EDG fabric monitoring framework, has been progressively deployed onto all managed Linux nodes. We have developed sensors to instrument fabric nodes to provide us with complete performance and exception monitoring information. Performance visualization displays and interfaces to the existing alarm system have also been provided. LEAF, the LHC-Era Automated Fabric toolset, comprises the State Management System, a tool to enable high-level configuration commands to be issued to sets of nodes during both hardware and service management Use Cases, and the Hardware Management System, a tool for administering hardware workflows and for visualizing and locating equipment. Finally, we will describe issues currently being addressed and planned future developments.