WLCG GGUS Operations

Every day

Scan through the GGUS notifications in your inbox. They may concern GGUS tickets for:

  • ROC_CERN, i.e. Grid Services. There it is important that the GGUS-SNow interface works well and the right supporters follow up the tickets with a response time relevant to the ticket priority and type. ALARM tickets should be in the hands of the experts within less than 1 hour.
  • The GGUS Support Unit (SU), i.e. incidents and requests related to the GGUS infrastructure itself.
  • Any other SU to which you belong. There you typically would act as a supporter.
  • Any GGUS ticket that was brought up at the WLCG Operations (Coordination) meetings as having a problem in routing or response.
  • Investigate user complaints for any email you receive as member of the ggus-escalation-notifications e-group, with Subject: REMINDER Escalation Level X.

In case of GGUS downtime

The e-group ggus-downtimes contains four sub-e-groups, named ggus-downtimes-VOname (VOname = alice | atlas | cms | lhcb). When the GGUS developers publish a downtime (scheduled or not) in GOCDB they should email the top-level e-group in addition. The sub-e-group members within the experiments decide whom to inform in their community.

Every Monday

Update the GGUS section in the appropriate page under WLCGOperationsMeetings with announcements of upcoming releases or debug info for relevant issues, if any. Participate in this meeting. If you can't be there, please read the notes from the meeting, in case there are GGUS tickets wrongly assigned or not properly followed up. There might also be new development requests, problems with the SNow or OSG interfaces, misunderstandings concerning the workflows, TEAM or ALARM ticket creators in the experiments who lost their privileges etc.

Before the WLCG MB (on Monday)

Prepare the graph of tickets

Update the file GGUS-inc-tickets.xlsx (NEW!). Download the latest version from the WLCGOperationsMeetings page, where it should be permanently attached. This file contains weekly summaries so that the corresponding graph will show GGUS incident ticket evolution over regular intervals. You need to cover the period from the Monday before the previous MB up to the Monday preceding the current MB, one week at a time. Open the GGUS-inc-tickets.xlsx file and:

  • move the mouse to the bottom right corner of the table where you will see a small mark;
  • click on that mark and drag the mouse downward to extend the table;
  • please add exactly the number of rows that are needed to cover the weeks for your report.

The dates and the totals per experiment will be filled in automatically. When needed, please move the graph area downward to make space for the additional rows.

The GGUS Report Generator is used to obtain the numbers for each additional row, per experiment. Instructions:

  1. Open the GGUS Report Generator.
  2. Select period from Monday-week-previous-MB  to  Monday-week-current-MB
  3. Select the 4 LHC VOs and click on Group by.
  4. Select ticket types USER and TEAM only. Do not select ALARM !
  5. Select ticket category Incident only.
  6. Select weekly aggregation.
  7. Click GO!
  8. Write the totals of each week in your local copy of file GGUS-inc-tickets.xlsx.
  9. To help avoid mistakes, compare the value in each column with the ones directly above:
    big changes in any of the columns ought to be rare.
  10. The ALARM ticket handling is described below.

Alarm tickets need special attention:

  • Test alarms (e.g. accompanying GGUS releases) are not always marked Test.
  • Use the GGUS search engine to list all alarm ticket candidates:
    • For Special attributes select ALARM-Tickets.
    • For Status ensure all is selected.
    • For Ticket category select Incident.
    • Select the appropriate time period and click Search
  • Check the subjects of the list of tickets shown: test cases should be obvious
    and must neither be included in GGUS-inc-tickets.xlsx nor in the MB report.
  • Each real alarm ticket should be counted in the GGUS-inc-tickets.xlsx file,
    in the appropriate column of the row for the week in which the ticket was opened.
  • Each real alarm ticket should also be briefly described in the MB report,
    usually in the operations section for the affected experiment and/or site.

When all weeks have been done, update the GGUS-inc-tickets.xlsx attachment on the WLCGOperationsMeetings page.
Mind you will need the graph for the MB Service Report, as documented below.

Then, to simplify filling out the table on the GGUS slide (see below):

  1. Select yearly aggregation.
  2. Click GO!
  3. Write the totals per ticket type for each experiment in the table on the GGUS slide, see below.

Prepare the slide for the MB

  • Please use WLCG-svc-report-wide-v3.pptx (or a newer version) attached to ScodRota for the WLCG Service report.
  • The last slide has a template table for the ticket totals of the covered period.
  • Include the area graph from the latest GGUS-inc-tickets.xlsx attached to WLCGOperationsMeetings.
    • The line graph is there just in case we might need it one day.

Around GGUS release dates

  1. On Monday at 3pm two days before: announce the upcoming release in the WLCG Operations meeting minutes. Emphasize any important upcoming changes listed in the release notes.
  2. Assist the GGUS team with the follow-up of problematic test alarm tickets, if needed.

About the GGUS-SNow interface

Although the mappings were agreed in January 2011, the interface has suffered from unilateral Snow changes for which GGUS was given no advance notification.

Documentation:

About the GGUS Architecture

Historic documentation:

Before the end-of-year break

Publish this text in the weekly operations meeting:

For the end-of-year break: GGUS is monitored by a system connected to the on-call service. In case of total GGUS unavailability the on-call engineer (OCE) at KIT will be informed and will take appropriate action. If GGUS is available but there is a problem with the workflow (e.g. ALARM to CERN doesn't generate email notification to the operators), then WLCG should submit an ALARM ticket, notifying site FZK-LCG2 (DE-KIT), which triggers a phone call to the OCE. As a last resort, the FZK-LCG2 emergency e-mail or telephone number published in the GOCDB can be contacted.

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatdocx CHEP-Architecture-LE.docx r1 manage 6.0 K 2014-04-08 - 11:04 MariaDimou GGUS Architecture description prepared by Oleg Dulov, KIT. Status of September 2013.
Edit | Attach | Watch | Print version | History: r35 < r34 < r33 < r32 < r31 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r35 - 2023-09-04 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback