Site Nagios and Availability comparison

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Description
Two topics to discuss: * Project plan * Nagios plugin * Availability comparison
Videoconference Rooms
WLCG_monitoring_consolidation
Name
WLCG_monitoring_consolidation
Description
Kick-off meeting for the WLCG monitoring consolidation project
Extension
109258925
Owner
Pablo Saiz
Auto-join URL
Useful links
Phone numbers
Participated at CERN:
Stefan, Eddie, Lionel, Pablo, Andrea, Maarten, Marian,  Julia, Elena, Nicolo (last to arrive)
Remotely:
Alessandra, Alessandro, Pepe, Jordi, David, Luca

First walked through list of tasks. For April-May there are 19 open tickets. Some of them are related to the reports presented at the meeting and will be closed before the next meeting.
Migration to AI progresses very well, there are only 3 VMs and 2 physical machines which still have to be migrated.
There was a comment from Marian that before we stop OPS tests we need to clarify what to do with C5 reports and SLS which are integrated with OPS tests.
Marian also mentioned that old message brokers should be retired, but ATLAS still publishes data to the old broker. ATLAS had been contacted but no action was taken from ATLAS site yet. Alessandra will follow it up with ATLAS.

Second item in the agenda was presentation from Jordi about Nagiod plugin.
There was a suggestion from Maarten to load the plugin to the WLCG repository. David commented that he tried the plugin and it worked fine for him. Pepe and Jordi mentioned that they were also in touch with Nikhef and Nikhef had also tried the plugin.

Next was the presentation from Elena about validation of SAM3 and comparison of availability/reliability numbers between SAM3 and SUM.
All differencies are understood.
There was a discussion about test validity, which is currently configured in SAM3 to 2 hours. Andrea mentioned that 24 hours validity was agreed with the MB and can not be changed without approval of the MB. The suggestion was to enable a possibility to have different profiles with different validities. Pablo told that in the current implementation it is not possible since validity is attached to a particular test, but might be possible to change.
Alessandro asked whether MB reports include site names in the GOCDB or VO convention . The suggestion was to have both names on the reports, which is currently the case only for T1s.

Next meeting will be on the 25th of April. Robert will report on evaluation of postgres compared to Oracle. Pablo also asked VO contacts to play with SAM3 and to provide feedback.


There are minutes attached to this event. Show them.
    • 14:00 14:05
      JIRA actions for April 5m
      Review of the JIRA actions scheduled for this and next month https://its.cern.ch/jira/issues/?filter=13902
      Speaker: Pablo Saiz (CERN)
    • 14:05 14:25
      Site Nagios plugin 20m
      Speaker: Jordi Casals (PIC)
      Slides
    • 14:25 14:45
      Availability comparison between SAM and SAM3 20m
      Speaker: Elena Tikhonenko (Joint Inst. for Nuclear Research (RU))
      Slides
    • 14:45 14:55
      Discussion 10m
    • 14:55 15:00
      Next meeting 5m