19–23 May 2014
LAPP
Europe/Paris timezone

Cluster Consolidation at NERSC

22 May 2014, 17:25
25m
Auditorium Marcel Vivargent (LAPP)

Auditorium Marcel Vivargent

LAPP

Basic IT Services Basic IT services

Speaker

Larry Pezzaglia (LBNL)

Description

This talk will provide a case study of cluster consolidation at NERSC. In 2012, NERSC began deployment of "Mendel", a 500+ node, Infiniband-attached, Linux "meta-cluster" which transparently expands NERSC production clusters and services in a scalable and maintainable fashion. The success of the software automation infrastructure behind the Mendel multi-clustering model encouraged investigation into even more aggressive consolidation efforts. This talk will detail one such effort: under the constraints of a 24x7, disruption-sensitive environment, NERSC staff merged a 400-node legacy production cluster, consisting of multiple hardware generations and ad-hoc software configurations, into Mendel's automation infrastructure. By leveraging the hierarchical management features of the xCAT software package in combination with other open-source and in-house tools, such as Cfengine and CHOS, NERSC abstracted the unique characteristics of both clusters away below a unified management interface. Consequently, both cluster components are now managed as a single, albeit complex, integrated system. Additionally, this talk will provide an update on the PDSF system at NERSC, including improvements to trending data collection and ongoing CHOS development.

Primary author

Presentation materials