USAG meeting 2009-04-02



Click on the agenda indico confID=55267
to follow links to the items discussed at the meeting!


Attendance
----------
GGUS - Torsten Antoni, Helmut Dres, Guenter Grein
CERN - Diana Bosio, Maria Dimou (chairperson)
UK/I - John Kewley, Claire Devereux
IT   - Riccardo Brunetti, Tiziana Ferrari
CE   - Jan Kmunicek, Dejan Lesjak, Jiri Chudoba
NE - Zeeshan Ali Shah (Sweden), Jan Justinus Keijser (NIKHEF), Ron Trompert (SARA), Roger Oskarsson (HPC2N), Gert Svensson (ROC_NE)
SWE  -
SEE  -
FR   - David Bouvet
Alice-
CMS  - Andrea Sciaba, Marco Calloni
LHCb -
Atlas -

Apologies
----------
Frederic Schaer, Jeff Templon.

Notes by Maria

Agenda:

  1. Agenda approval
  2. Comments on minutes from the February 26th USAG meeting [here]
  3. Continuous user support assessment as required by MSA-1.6 via the weekly escalation reports
  4. Future of ticket triage (TPM).
    Torsten attended the SA1 f2f meeting in Catania last month and was asked together with the user support team + USAG to investigate and come back with a proposal on the Future of ticket triage (TPM).
    Taking into account:
    1. The volume of tickets in the last months,
    2. The originating VOs,
    3. The user support strategy of the VOs (some LHC VOs do their one triage before, use of Direct Site Notification etc),
    who will provide the effort: central or NGIs, automated triage based on user certificate?
    This is the first of a series of USAG meetings where we are called to shape the User Support strategy in the last year of EGEE and in view of EGI. It is important for us to think carefully where (how much decentralised) we wish ticket triage, alias 1st Level Support, alias TPM services to take place.
  5. Review Action List [go to point 8 of these notes.]
  6. A.O.B.
    It's show time folks, again! This week since March 30th : LHC experiment VOs to perform an ALARM ticket test (full round from opening to ticket closing) to Tier1s. [savannah ticket #107452] and [testing rules]. Summary reports must be sent to wlcg-operations@cern.ch by tomorrow, April 3rd, at the latest!
  7. Next meeting date: Thu 30 Apr 2009 @ 9:30 CET ! (Algorithm: Last Thursday of the month)

Discussion:

  1. Agenda: Approved.
  2. Minutes: Approved.
  3. Continuous GGUS assessment via the escalation reports:
    Analysis from the latest ROC and TPM escalation reports Escalation reports for ROCs were evaluated this time. SWE and SEE being absent, their data were not discussed. CE supporters examine their tickets in a regional weekly meeting so they have few and recent cases in the escalation reports. Further discussion on ROCs will continue at the next USAG (ACTION Maria for the next agenda).

    The escalation reports for TPMs were also discussed to introduce the main theme of this meeting, i.e. the TPM decentralisation in view of the ROCs disappearing in favour of EGI NGIs. There is an, almost, steady number of GGUS tickets per week (approximately 80) and, an equally steady number of tickets assigned in more than one hour (approximately 20, recently up to 35!). This has been subject of reports for almost 2 years, e.g. TPM monitoring reports in EGEE II - year 2.

  4. Future of ticket triage (TPM)
    Facts:
    TPM service is currently offered by the ROCs with a weekly rotation. When the ROCs become (5 times as many) NGIs the relevant service (triage, dispatch, 1st level, TPM and similar terms) will have to change.

    Discussion:
    The EGI Blueprint, section 3.1.1. EGI.org Tasks and Resources, page 14 states 4.5 FTEs for User Support. 2 for GGUS development and 2 for ticket triage. Here is the relevant Blueprint doc. extract:

    
    1. in the EGI blueprint CERN as international research institute is categorized under EIROFORUM. Every EIROFORUM institute if willing to, will have the 
    chance to be a full member of EGI as any other NGI. Being member implies EC-cofunding and committment to international tasks. The regional helpdesk is 
    one of such duties. Quoting:
    
    "Major European research institutes represented in the EIROFORUM [4] and ESFRI [5] projects are also invited to contribute to and to benefit from EGI. 
    Associated Membership to EGI is open for the EIROFORUM"
    
    "The tasks and the funding [note: EC co-funding] will be allocated according to the size and commitment of the individual NGIs (and EIROFORUM 
    institutes/ESFRI projects)."
    
    2. User support effort in EGI.org (under the Operations Function more is available under the User Community services function): pag 14 of the blueprint
    
    "User Support: 4.5 FTEs
    2 FTE: Maintenance and Operation of a central ticket handling system for grid and network end-to-end problems. User support relies on a central 
    helpdesk, which is a regional support system with central coordination, GGUS. It gives access to user documentation and support, and to a problem 
    ticketing system. 1st line local/regional support by NGIs
    
    2 FTE: Triage: assignment of tickets to the 2nd line support units, ticket escalation end ticket follow-up to ensure they get closed
    
    0.5 FTE: Gathering of requirements for user support tools and process taking input from NGI's and VOs, interoperations of ticketing systems (EGI.ORG + 
    NGI): to take into account additional requirements which may arise with the evolution of the middleware stacks in use, and with the support of new user 
    communities EGI.org coordination and support"
    
    ROC_IT is suggesting the TPM effort to be offered by one, always the same NGI. All other participants found this proposal contrary to the decentralisation concept. CMS suggestion was to extend the TPM team decentralisation but not to a number as high as the NGIs (at least 35!).

    Decision:
    Instead of one TPM team responsible for all GGUS tickets in a given week (today's situation) we should move to a workflow where:

    will route the ticket to the TPM team within the relevant NGI. This should be done gradually and probably never reach as many teams as the number of NGIs
    Central coordination is still necessary for:
    Points to clarify resulting from this discussion are listed in the action list, starting with number 20090402- and the conclusions will shape the new OLA.
  5. ACTION LIST:
    20090129-2
    Document in TPM wiki the new TPMs' coaching and the new TPM role.
    Diana
    Pending. Action transfered to Maria. Will do this in the the new version of the TPM OLA.


    20090226-2
    Report at the next USAG meeting on the TPM training at the COD meeting in Bologna.
    Diana
    Done. Vera (ROC_NE) and David (ROC_FR) attended and found such sessions useful. They should be regularly organised in the framework of other events. In the future, they should be cross-NGIs.


    20090402-1
    Review the 18 points of 'RESPONSIBILITIES' in section 4 of the current version 0.6 of the TPM OLA.
    All.


    20090402-2
    Evaluate how heavy it is for GGUS to register all ticket submitters and to check each ticket submitter's DN in the database before passing it to the 1st level (TPM at the relevant NGI).
    GGUS developers.


    20090402-3
    Check what will happen with tickets submitted users with a @cern.ch email or with a CERN DN. If CERN is an "Associate" EGI member does it have funding and voting rights? Should it provide TPM effort?
    Maria.


    20090402-4
    Make "Impact on GGUS workflow and reputation" the theme of the next USAG. Also ROC evaluation the subject of 'Continuous GGUS assessment' agenda item.
    Maria

  6. A.O.B.

  7. Decide on next meeting date: Algorithm is: last Thursday of the month. Next meeting is Thursday 2009-04-30 @ 9:30am CET.