10–14 Oct 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

Evolution, design, management and support for the CMS Online computing cluster

11 Oct 2016, 15:30
1h 15m
San Francisco Marriott Marquis

San Francisco Marriott Marquis

Poster Track 1: Online Computing Posters A / Break

Description

After two years of maintenance and upgrade, the Large Hadron Collider (LHC) has started its second four year run. In the mean time, the CMS experiment at the LHC has also undergone two years of maintenance and upgrade, especially in the field of the Data Acquisition and online computing cluster, where the system was largely redesigned and replaced. Various aspects of the supporting computing system will be addressed here.

The increasing processing power and the use of high end networking technologies (10/40Gb/s Ethernet and 56Gb/s Infiniband) has reduced the number of DAQ event building nodes, since the performance of the individual nodes has increased by an order of magnitude since the start of LHC. The pressure on using the systems in an optimal way has increased accordingly, thereby also increasing the importance of proper configuration and careful monitoring to catch any deviation from standard behaviour. The upgraded monitoring system based on Ganglia and Icinga2 will be presented with the different mechanisms used to monitor and troubleshoot the crucial elements of the system.

The evolution of the various sub-detector applications, the data acquisition and high level trigger, following their upgraded hardware and designs over the upgrade and running periods, require a performant and flexible management and configuration infrastructure. The puppet based configuration and management system put in place for this phase, will be presented, showing it's flexibility to support a large heterogeneous system, as well as, it's ability to do bulk installations from scratch or rapid installations of CMS software cluster wide. A number of custom tools have been developed to support the update of rpm based installations by the end users, a feature not typically supported in a datacenter environment. The performance of the system will also be presented with insights into its scaling with the increasing farm size over this data taking run.

Such a large and complex system requires redundant, flexible core infrastructure services to support them. Details will be given on how a flexible and highly available infrastructure has been put in place, leveraging various high availability technologies, from network redundancy, through virtualisation, to high availability services with Pacemaker/Corosync.

To conclude, a roundup of the different tools and solutions used in the CMS cluster administration will be given, pulling all the above into a coherent, performant and scalable system.

Primary Keyword (Mandatory) Computing facilities
Secondary Keyword (Optional) DAQ

Primary author

Co-authors

Andre Georg Holzner (Univ. of California San Diego (US)) Attila Racz (CERN) Benjamin Gordon Craigs (CERN) Christian Deldicque (CERN) Christoph Paus (Massachusetts Inst. of Technology (US)) Christoph Schwick (CERN) Cristian Contescu (Fermi National Accelerator Lab. (US)) Dainius Simelevicius (CERN) Dominique Gigi (CERN) Emilio Meschi (CERN) Frank Glege (CERN) Frans Meijers (CERN) Georgiana Lavinia Darlea (Massachusetts Inst. of Technology (US)) Guillelmo Gomez Ceballos Retuerto (Massachusetts Inst. of Technology (US)) Hannes Sakulin (CERN) James Gordon Branson (Univ. of California San Diego (US)) Jean-Marc Olivier Andre (Fermi National Accelerator Lab. (US)) Jeroen Hegeman (CERN) Jonathan Fulcher (CERN) Lorenzo Masetti (CERN) Luciano Orsini (CERN) Marco Pieri (Univ. of California San Diego (US)) Nicolas Doualot (Fermi National Accelerator Lab. (US)) Olivier Chaze (CERN) Petr Zejdl (Fermi National Accelerator Lab. (US)) Philipp Maximilian Brummer (CERN) Raul Jimenez Estupinan (CERN) Remigius K. Mommsen (Fermi National Accelerator Lab. (US)) Samim Erhan (Univ. of California Los Angeles (US)) Sergio Cittolin (Univ. of California San Diego (US)) Srecko Morovic (CERN) Thomas Reis (CERN) Ulf Behrens (Deutsches Elektronen-Synchrotron (DE)) Vivian O'Dell (Fermi National Accelerator Laboratory (FNAL)) Zeynep Demiragli (Massachusetts Inst. of Technology (US))

Presentation materials