14-18 October 2013
Amsterdam, Beurs van Berlage
Europe/Amsterdam timezone

Self managing experiment resources

14 Oct 2013, 15:00
45m
Grote zaal (Amsterdam, Beurs van Berlage)

Grote zaal

Amsterdam, Beurs van Berlage

Poster presentation Distributed Processing and Data Handling A: Infrastructure, Sites, and Virtualization Poster presentations

Speakers

Federico Stagni (CERN) Mario Ubeda Garcia (CERN)

Description

Within this paper we present an autonomic Computing resources management system used by LHCb for assessing the status of their Grid resources. Virtual Organizations Grids include heterogeneous resources. For example, LHC experiments very often use resources not provided by WLCG and Cloud Computing resources will soon provide a non-negligible fraction of their computing power. The lack of standards and procedures across experiments and sites generated the appearance of multiple information systems, monitoring tools, ticket portals, etc... which nowadays coexist and represent a very precious source of information for running HEP experiments Computing systems as well as sites. These two facts lead to many particular solutions for a general problem: managing the experiment resources. In this paper we present how LHCb, via the DIRAC interware addressed such issues. With a renewed Central Information Schema hosting all resources metadata and a Status System ( Resource Status System ) delivering real time information, the system controls the resources topology, independently of the resource types. The Resource Status System applies data mining techniques against all possible information sources available and assesses the status changes, that are then propagated to the topology description. Obviously, giving full control to such an automated system is not risk-free. Therefore, in order to minimise the probability of misbehavior, a battery of tests has been provided in order to certify the correctness of its assessments. We will demonstrate the performance and efficiency of such a system in terms of cost reduction and reliability.

Primary authors

Presentation Materials