Speaker
Mr
William Tomlin
(CERN)
Description
The collaboration between BARC and CERN is driving a series of enhancements to ELFms
[1], the fabric management tool-suite developed with support from the HEP community
under CERN's coordination. ELFms components are used in production at CERN and a
large number of other HEP sites for automatically installing, configuring and
monitoring hundreds of clusters comprising of thousands of nodes. Developers at BARC
and CERN are working together to improve security, functionality and scalability in
the light of feedback from site administrators. In a distributed Grid computing
environment with thousands of users accessing thousands of nodes, reliable status and
exception information is critical at each site and across the grid. It is therefore
important to ensure the integrity, authenticity and privacy of information collected
by the fabric monitoring system. A new layer has been added to Lemon, the ELFms
monitoring system, to enable the secure transport of monitoring data between
monitoring agents and servers by using a modular plug-in architecture that supports
RSA/DSA keys and X509 certificates. In addition, the flexibility and robustness of
Lemon has been further enhanced by the introduction of a modular configuration
structure, the integration of exceptions with the alarm system and the development of
fault tolerant components that enable automatic recovery from exceptions. To address
operational scalability issues, CCTracker, a web-based visualization tool, is being
developed. It provides both physical and logical views of a large Computer Centre
and enables authorized users to locate objects and perform high-level operations
across sets of objects. Operations staff will be able to view and plan elements of
the physical infrastructure and initiate hardware management workflows such as mass
machine migrations or installations. Service Managers will be able to easily
manipulate clusters or sets of nodes, modifying settings, rolling out
software-updates and initiating high-level state changes.
[1] http://cern.ch/elfms
Primary authors
Mr
German Cancio Melia
(CERN)
Mr
William Tomlin
(CERN)
Co-authors
Mr
Dinesh Sarode
(BARC)
Mr
Miroslav Siket
(CERN)
Mr
Murthy Chandragiri
(BARC)
Mr
P.S. Dhekne
(BARC)
Mr
Ramgopal Mundada
(BARC)
Mr
Sharma Rohitashva
(BARC)
Mr
Sonika Sachdeva
(BARC)