AI Monitoring

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map
Slides

Attendance

IT-CF         Luis Pigueiras, Massimo Paladin, Miguel Santos, Pedro Andrade, Ivan Fedorko (speaker)
IT-CIS       Marek Domaracky
IT-CS        Véronique Lefebure
IT-DB        Georgios Kaklamanos
IT-DI         Denise Heagerty
IT-DSS      Alex Iribarren, Jan Iven
IT-OIS       Tim Bell, Zilli Stefano
IT-PES       Gavin Mccance, Ioannis Agtzidis, Manuel Guijarro, Steve Traylen, Vítor Gouveia
PH-ATLAS  Alexey BuzyKaev, Sergey Baranov, Yuri Smirnov
PH-LBC      Loic Brarda
PH-LCD      Andre Sailer
PH-UCM     Ivan Glushkov
 

Questions


Q - Are you going to provide common tools to check the status of the node?
A - There will be tools, lemon cli, roger, etc...

Q - Why roger isn't in the architecture?
A - Roger is a snow consumer and doesn't provide any notification

Q - What are the plans for the dashboard?
A - The dashboard is a secondary tool that provides an overall vision of what happened on the last days.

Q - How is the integration between metrics and FE?  Can we define a metric to reach the service manager?
A - Each metric has is own responsible and an associated FE, with the exception of the hardware at the moment we don't know where is supposed to go. It is possible to define metric per box. The puppet variables define the responsible

Q - Who is the metric manager?
A - The metric manager is the responsible of the egroup.

Q -  There are some situations where the desired FE target for many Lemon exceptions in SNOW should be the owning FE of box (e.g swap full). What to do?
A - The tickets get redirected to the application owner rather than the operators.

Q - What is the status of the migration of the Alarms and how many alarms are defined in Quattor?
A - Around 500 alarms and some of them are legacy, not all of them are going to be migrated
 

Q- Where is the definition of the alarms and exceptions?
A - The notification message contains a link where we can check the meaning of the alarms/exception.


Future meetings

The next meeting will be in 2 weeks and is expected to be about the new development workflow of the Configuration Management.

 

There are minutes attached to this event. Show them.
    • 14:00 15:00
      Alarming with General Notification Infrastructure (GNI)
      • Metric registration
      • Lemon Producer
      • Service-Now integration
      • GNI Dashboard
      • No Contact Processor
      • Current status and next steps
      Convener: Ivan Fedorko (CERN)