WLCG GGUS Operations
Every day
Scan through the GGUS notifications in your inbox. They may concern GGUS tickets for:
- ROC_CERN, i.e. Grid Services. There it is important that the GGUS-SNow interface works well and the right supporters follow up the tickets with a response time relevant to the ticket priority and type. ALARM tickets should be in the hands of the experts within less than 1 hour.
- The
GGUS
Support Unit (SU), i.e. incidents and requests related to the GGUS infrastructure itself.
- Any other SU to which you belong. There you typically would act as a supporter.
- Any GGUS ticket that was brought up at the WLCG Operations (Coordination) meetings as having a problem in routing or response.
- Investigate user complaints for any email you receive as member of the
ggus-escalation-notifications
e-group, with Subject: REMINDER Escalation Level X.
In case of GGUS downtime
The e-group
ggus-downtimes
contains four sub-e-groups, named
ggus-downtimes-VOname
(VOname = alice | atlas | cms | lhcb). When the GGUS developers publish a downtime (scheduled or not) in GOCDB they should email the top-level e-group in addition. The sub-e-group members within the experiments decide whom to inform in their community.
Every Monday
Update the GGUS section in the appropriate page under
WLCGOperationsMeetings with announcements of upcoming releases or debug info for relevant issues, if any. Participate in this meeting. If you can't be there, please read the notes from the meeting, in case there are GGUS tickets wrongly assigned or not properly followed up. There might also be new development requests, problems with the SNow or OSG interfaces, misunderstandings concerning the workflows, TEAM or ALARM ticket creators in the experiments who lost their privileges etc.
Before the WLCG MB (on Monday)
Prepare the graph of tickets
Update the file
GGUS-inc-tickets.xlsx
(
NEW!). Download the latest version from the
WLCGOperationsMeetings page, where it should be permanently attached. This file contains weekly summaries so that the corresponding graph will show GGUS
incident ticket evolution over regular intervals. You need to cover the period from the Monday before the previous MB up to the Monday preceding the current MB, one week at a time. Open the
GGUS-inc-tickets.xlsx
file and:
- move the mouse to the bottom right corner of the table where you will see a small mark;
- click on that mark and drag the mouse downward to extend the table;
- please add exactly the number of rows that are needed to cover the weeks for your report.
The dates and the totals per experiment will be filled in automatically. When needed, please move the graph area downward to make space for the additional rows.
The
GGUS Report Generator is used to obtain the numbers for each additional row, per experiment. Instructions:
- Open the GGUS Report Generator.
- Select period from
Monday-week-previous-MB
to Monday-week-current-MB
- Select the 4 LHC VOs and click on
Group by
.
- Select ticket types
USER
and TEAM
only. Do not select ALARM
!
- Select ticket category
Incident
only.
- Select weekly aggregation.
- Click
GO!
- Write the totals of each week in your local copy of file
GGUS-inc-tickets.xlsx
.
- To help avoid mistakes, compare the value in each column with the ones directly above:
big changes in any of the columns ought to be rare.
- The
ALARM
ticket handling is described below.
Alarm tickets need special attention:
- Test alarms (e.g. accompanying GGUS releases) are not always marked
Test
.
- Use the GGUS search engine to list all alarm ticket candidates:
- For
Special attributes
select ALARM-Tickets
.
- For
Status
ensure all
is selected.
- For
Ticket category
select Incident
.
- Select the appropriate time period and click
Search
- Check the subjects of the list of tickets shown: test cases should be obvious
and must neither be included in GGUS-inc-tickets.xlsx
nor in the MB report.
- Each real alarm ticket should be counted in the
GGUS-inc-tickets.xlsx
file,
in the appropriate column of the row for the week in which the ticket was opened.
- Each real alarm ticket should also be briefly described in the MB report,
usually in the operations section for the affected experiment and/or site.
When all weeks have been done, update the
GGUS-inc-tickets.xlsx
attachment on the
WLCGOperationsMeetings page.
Mind you will need the graph for the MB Service Report, as documented below.
Then, to simplify filling out the table on the GGUS slide (see below):
- Select yearly aggregation.
- Click
GO!
- Write the totals per ticket type for each experiment in the table on the GGUS slide, see below.
Prepare the slide for the MB
- Please use
WLCG-svc-report-wide-v3.pptx
(or a newer version) attached to ScodRota for the WLCG Service report.
- The last slide has a template table for the ticket totals of the covered period.
- Include the area graph from the latest
GGUS-inc-tickets.xlsx
attached to WLCGOperationsMeetings.
- The line graph is there just in case we might need it one day.
Around GGUS release dates
- On Monday at 3pm two days before: announce the upcoming release in the WLCG Operations meeting minutes. Emphasize any important upcoming changes listed in the release notes.
- Assist the GGUS team with the follow-up of problematic test alarm tickets, if needed.
About the GGUS-SNow interface
Although the mappings were agreed in January 2011, the interface has suffered from unilateral Snow changes for which GGUS was given no advance notification.
Documentation:
About the GGUS Architecture
Historic documentation:
Before the end-of-year break
Publish this text in the weekly operations meeting:
For the end-of-year break: GGUS is monitored by a system connected to the on-call service. In case of total GGUS unavailability the on-call engineer (OCE) at KIT will be informed and will take appropriate action. If GGUS is available but there is a problem with the workflow (e.g. ALARM to CERN doesn't generate email notification to the operators), then WLCG should submit an ALARM ticket, notifying site FZK-LCG2 (DE-KIT)
, which triggers a phone call to the OCE. As a last resort, the FZK-LCG2
emergency e-mail or telephone number published in the GOCDB can be contacted.