CVMFS monitoring

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Description
Two main topics to discuss: * Nagios boxes migration * cvmfs monitoring

Participants: Eddie, Luca, Costin, Pablo, Nicolo', Marian, Maarten, Stefan (last to arrive)

Remote: David C.

Minutes taken by Luca

Volunteers for future minutes: Andrea, Stefan

__________________________________________________________________________________________

 

Pablo: the first topic is SAM-Nagios migration, cause there are several actions related.

The goal is to move SAM-Nagios from Quattor to Agile infrastructure, replicating a production/pre-production/development configuration.

The presentation from Pablo shows a configuration overview and a list of items to be addressed.
 

Marian on the items:

 * use of standar Nagios is not possible

 * integration of the new messaging library in the current code can be a lot of work

 

General discussion on what differences to expect from SLC6/UMD3. SLC6 should not be much different, but at least the new version of Python (2.6) will have some impact on the probes.

Marian proposes to first re-factor NCG/Job framework, then work on porting the re-factored system to the new SLC6/UMD3 and new libraries.
 

Pablo proposes to give a shot on current system migration to get estimation on time needed to port.

 

The final agreement is to work in parallel on both, giving a shot on migration on the current system (using SAM-Nagios v23 for the exercise) to understand the current limitations, and starting the work on NCG/Job framework re-factoring. Both works to be done in the light that current' SAM-Nagios lifetime expires at the end of 2014. Time estimation on the issues will be done offline.

 

 

Costin presentation - new Alice metrics and CVMFS Monitoring

——

 

Mechanism in place to publish new metrics based on Monalisa reported metrics. Advantages is to use real work flow information to estimate site’s status.


After the presentation, one subject of discussion was the responsibility of recomputation.
With this new approach, it should be clearly defined who sites have to contact in case of recomputation requests, given that the recomputations should be acknowledge by the person providing the metric to be meaningful.

 

The tool is using standard activemq STOMP library, activemq-stomp with SSL, no plan to use directory queue from MIG team.

 

The second part of the presentation was about CVMFS Monitoring. Costin proposes to install monitoring on the server side, which could even configure itself, connecting to the closest working higher stratum.

The idea of using this approach for generic cvmfs monitoring is appealing for Alice sites but more compicated fo other VOs' site. Main challenge will be to convince site to install it, and given that it configure automatically…

Nicolo: Atlas and CMS have site squid monitoring already

Costin: They have the client side. This approach will do the server side, and it will be VO independent.

This topic shoule be presented at a WLCG Operation Coordination and/or the WLCG Workshop in July.

Experimentation will proceed starting with some ALICE sites. For site serving ALICE and other VOs it will be enabled as another infrastructure monitoring. This work will have to be presented at one of the operation meeting.

 

Next meeting 20/6, at 14:00, and the topic will be the metrics of MyWLCG trends

There are minutes attached to this event. Show them.
    • 14:00 14:05
      JIRA actions for May 5m
      Review of the JIRA actions scheduled for this and next month https://its.cern.ch/jira/issues/?filter=13902
      Speaker: Pablo Saiz (CERN)
    • 14:05 14:15
      Nagios boxes migration 10m
      Speaker: Pablo Saiz (CERN)
      Slides
    • 14:15 14:35
      ALICE metrics and cvmfs monitoring 20m
      Speaker: Costin Grigoras (CERN)
      Slides
    • 14:35 14:55
      Discussion 20m
    • 14:55 15:00
      Next meeting 5m