COD Pole1 - ROD model assessment

Europe/Zurich
EVO (Virtual Meeting)

EVO

Virtual Meeting

Marcin Radecki (Unknown)
Description
Goal: assessment of operations in ROD model after 5 weeks Outcome: decide on the major features of the dashboard to be implemented now Connecting details:
  1. EVO
  2. Title: Pole1 - ROD assessment, Community: UNIVERSE
    Meeting password:pole1
    From: 13:00 (meeting starts at 13.30) CEST
    To: 15:00 CEST
    EVO gate click "Start" and browse the right link in the list ("Pole1 - ROD assessment").
  3. EVO by phone call
  4. EVO Phone Bridge Telephone Numbers:
    Slovakia (UPJS, Kosice) +421 55 234 2420
    Switzerland (CERN, Geneva) +41 22 76 71400
    Italy (INFN, several cities) Phone numbers Enter '4000' to access the EVO bridge
    Germany (DESY, Hamburg) +49 40 8998 1340
    USA (BNL, Upton, NY) +1 631 344 6100
    USA (Caltech, Pasadena, CA) +1 626 395 2112
    Phone Bridge identifier:769732 password: 9112
  5. RMS visioconference (backup) -- Title : Pole1 phone conf 2009-01
    -- Date : 2009 february 06
    -- Time : 13:30:00 local time
    -- Length : 01:30:00
    -- Call number :
    ---- IP : 193.48.95.69
    ---- ISDN : +33 (0)4 26 68 73 00
    ---- TEL : +33 (0)4 26 68 73 00
    -- Numeric identifier : 22138 (end by #)
    -- PIN code : 6355 (end by #)

    • 13:30 15:00
      ROD model assessment after 5 weeks of operation

      Agenda:

      1. Feedback on the dashboard
        1. Summary of improvements from January - Cyril Savannah link Overview of improvements/bugfixes done during the first weeks of running the dashboard.
        2. New feature request round table CE: 1) alarm age does not increase on weekends (#106979 savannah), 2) dashboard view notepad improvements (#106737 savannah) - Malgorzata
          NE:
          SWE:
          AP:
      2. Feedback on the procedures
      3. -- to be reflected in Vera's document --
        1. All assesment of the workflow (1rst line support/ rod/c-cod)
        2. CE: alarms are closed by 1st line. Those which are left are described in the "site notepad" field. ROD and 1st line uses a "Jabber conference room" (something similar to IRC channel).
          NE:
          SWE:
          AP:
        3. Amendment on the tresholds for the above workflow
        4. CE: we propose to close tickets after 2 days if all tests are ok and site admin confirmed that he/she fixed the problem.
          NE:
          SWE:
          AP:
        5. Proposal for C-COD working rules (Malgorzata)
        6. 1. C-COD will NOT handle alarms and tickets.
          2. The main task for C-COD during the week will be sending e-mails to proper ROD (with CC to C-COD) with request for explanation and action on particular alarms/tickets.
          3. In case of no action from the ROD, C-COD will raise the issue on the next OM.
          4. In the end of the week C-COD leader should send a handover to C-COD mailing list with information in which state issues was left for next shift.
      4. Round the table from the 4 federations on other issues
      5. CE:
        NE:
        SWE:
        AP:
      6. Pole 1 Actions list review, update and roadmap for the next months.
      7. - [MK+MR] minor modifications to the regional operations model document
        - [MK+STL] to check knowledge base topic readiness by Jan 5th
      8. Strategy to assess the model in mid-March (Helene)
      9. - define the tools and procedure (metrics, smoothness of operations, else) decision process to let the 3 more federations in the new model -- by ...when ?
        - process to have these federations prepared -- follow the checklist built in Abingdon -- needs modification?,
        - prepare a presentation as the pilot feds did in COD-18, something else??
      10. AOB:
      11. - informal pole meeting in Catania - March 3rd?
        - COD-19 agenda : training session need trainers. Please pilot federations travel on Sunday.
        - COD-19 Is the feedback and assesment gathered in the pole 1 presentation? COD-19 Can we start also a regional operator forum feedback of experience at COD-19? how?
        - COD-19 Any specific needs on the format of the meeting, specific needs for presentations / GGUS? else?
      paper
      Attendance: Vera, Luuk, Helene, Cyril, Kai, Malgorzata, Marcin.
      1. Discussion on savannah tasks related to Regional Dashboard.Savannah link: https://savannah.cern.ch/support/?group=cicportal
        1. 106519: Implementation of ROD metrics
          Some metrics are already implemented. They can be seen at: https://cic.gridops.org/index.php?section=roc&page=dashboard&subpage=metrics
          Action on Marcin to start discussion with people on if the current metrics are useful and what could be added.
        2. 106551:Implementation of CCOD metrics
          Need better definition. Linked with the action above.
        3. 106574:Need to specify another scope than federation in dahsboard
          Cyril: Action ongoing. Shall be ready for COD-19. There will be a possibility to group sites for the view, not depending on country or whatever. There will be no country view, just grouping.
          Helene: This is interesting for Ioannins (SEE) and Vera (NE).
        4. 106575: Implement two separate views for 1st line support and ROD role
          Not clear for Cyril.
          Vera: It is confusing for people if they are 1st line support and using the same portal as ROD. It is better to have two different pages.
          Marcin: Functionality of 1st line support and ROD shall be categorized and their abilities tuned to the role they have in specific region. Agree with Vera's point of two different pages.
          Malgorzata: ROD has unnecessary view like young alarms
          This topic need to better discussed.
          Action on Cyril to trigger discussion on it. Helene suggested that Cyril makes a list of options of display and discuss with people what should be most suitable.
        5. 106576: Display enhancements to improve readability
          AP had some comments:
          a-1. To have options to hide sites without alarms and problems. (We can focus on the problematic sites. For independent issue with OK site,email and voice are utilized.)
           a-2. To have options to hide masked alarms since we deal with the "main" alarms on the dashboard. In case of we want to see masked alarms , we go to 'Alarms' interface. (We are wandering the purpose to deal with masked alarms independently.)
          This shall be put into the ticket.
        6. 106577: IRC channel for CCOD was considered as not necessary as in the current work model we have a CCOD leader who deals with the tasks and other CCODers who are best contacted by e-mail. if we find in inefficient we can go for IRC.
        7. 106737: notepad in Alarm view and dashboard
          Helene: do we have a use case here?
          Vera: probably the issue is when more people are using the same notepad concurrently.
          Cyril: implementation is technically difficult, as each time the notepad is accessed it would require reading it from the DB.
          Helene: there shall be a component for notepad done already.
        8. 106979: alarm age should not increase on weekends
          Cyril: calculation of time is done in lavousier module, so will probably need to apply some patch which will recalculate the time in php.
          Marcin: this is required by CE as without this feature the dashboard does not fulfil the model requirements on Monday mornings (a lot of alarms >24h which however shall not be escalated).
          Kai/Vera: there are plans to put public holidays in GOCDB so admins can express they will not work at that days. Special category of SD "public holiday".
          Action on Kai to submit it to OAT. (?MR - not sure here)
      2. Feedback on dasboard

      3. CE: issues covered in 106979 and 106737.
        NE: issues covered in 106575 and 106574
        Luuk: 1) combine 1st line and ROD view 2) switch off alarms that are OK
        SWE: 1) region to have customized template for tickets 2) adapt duties between 1st line and ROD. Change ticket escalation procedure.
        AP: 1) To have options to hide sites without alarms and problems. (We can focus on the problematic sites. For independent issue with OK site,email and voice are utilized.) 2) To have options to hide masked alarms since we deal with the "main" alarms on the dashboard. In case of we want to see masked alarms , we go to 'Alarms' interface. (We are wandering the purpose to deal with masked alarms independently.) 3) To have options to rank/sort alarms. This is to help us prioritize work. 4) Hope to have more automations to help simplify operation tasks, e.g. automatically set off the alarms with 'OK' status and are non critical within certain period time, or dashboard provides function to close all 'OK' alarms at the same time. 5)To deal with alarms within 24 hrs, it will be more convenient if 1st line support can directly open tickets on 'Alarms' interface. Some of them covered in 106576. Shall be expressed there.

        All federations ecouraged to express their comments to the dashboard in Savannah tickets.
      The meeting time had ended and we had to cut it as extending it was not suitable for people. The next meeting will be in March to prepare for COD-19.