Project Kick-off meeting

Europe/Zurich
513/1-021 (CERN)

513/1-021

CERN

Veronique Lefebure (CERN)
Description
Presentation of the "LHC Computing Piquet Services" project.
  • Present: Jamie Shiers, Nick Thackray,Maria Girone,Ludwig Pregernig,Veronique Lefebure
  • Introduction Questions and comments raised while going through the mandate none Questions and comments raised while going through the CERN Piquet Rules How can one oblige someone to be on piquet ? What will be the process regarding modification of contracts ? The rules that there be a minimum of 5 persons on a rota, but no more than 9 weeks of duty per year, conflict, and does not take into account people being on holiday. The leave compensation is not realistic: people most probably don't have the time to take it anyway, and it is not fair if they simply loose it People should be free to choose between money compensation and leave compensation, for both the on-call compensation and the intervention compensation the time of the intervention for which a compensation is offert, should be independent of the fact that the person intervenes from home or goes to CERN Questions and comments raised regarding the LCG MoU numbers The numbers should be reviewed by the operation boards of the WLCG Collaboration (see Jamie's notes from Sept.10 2006 attached to point 3 of the agenda below)
  • Working Plan The attached slides were presented as a basis for a per Area Service Analysis in terms of dependency (and hence criticality) and expert needs and availability. It has been agreed that each of us would produce similar diagrams and tables for the next meeting.
  • General Considerations about Piquet Services - See attached notes produced by Jamie last September, and circulated after the meeting - Jamie suggested that reducing the Piquet Service from 24/7 to 16/7 could be already enough and provide a much better life style - Jamie reminded that there is not much sense to have a Piquet on the IT side if there is none on the experiment side to ensure the good running of the VO boxes. - Jamie said that "catch up" scenario may also allow to decrease the Piquet time window. He showed a recent plot where CMS had a transfer problem for a few hours but was able to catch up quickly after the problem was fixed. - Jamie suggested that a first step, before starting a Piquet Service, would be to try to reach the MoU numbers during day time. - Maria said that the DB service has already 2 persons on Piquet, with no compensation. The group would thus welcome the creation of official Piquet Services. The group has already done the work of delegating a maximum of the problem shooting interventions to the operator on one hand, and of increasing the robustness of the infrastructure on the other hand. But they are left with cases where DB expert are still really needed to keep the services running. Problems increase in nuber with the number of services increasing. - Ludwig said that being on-call is a pain, documentation and training are needed - Ludwig said that the Network team has already a Piquet (outsourced), but this one does not escalate problems, and rather wait for the next working day for fixing problems that they cannot fix themselves. - Ludwig reminded that network issues can also involve "the other side" of the network, not only CERN. - Nick said the GRID services already rely on a few persons working on best-effort basis, some of them barely sleep during the night, What will happen when they are sick or on holiday ? - Nick asked what Service Level do we really need ? - Nick said that few GRID services can be made more reliable than what they are now (for ex. by insuring fail-over systems). - Veronique said that it was not clear who can call a Piquet: users are often the best monitoring system but they cannot trigger a piquet call. One would probably need something like a GMOD Piquet. Jamie answered that this has already been decided: only a selected list of people per experiment will be allowed to call the Piquet. - Veronique said that it may be useful to be allowed to organise a Piquet Service for a period shorter than one month, to deal with periods where a know bug is causing problems but is also in the course of being fixed. - Veronique said that being able to rely on a Piquet may have the effect of decreasing the motivation to have robust service. - Veronique said that it would be good that a systematic check/test of the problem shooting procedures be put in place.
  • There are minutes attached to this event. Show them.
      • 14:00 14:20
        Introduction 20m
      • The Project Mandate (see attached document)
      • The HR Regulations regarding Piquet Services (see AC23)
      • The LCG MoU
    Speaker: Veronique
    AC23
    LCG MoU
    Piquet Rules summary
    Project Mandate
  • 14:20 14:40
    Working Plan 20m
    1. List of Services in each area
    2. their dependencies
    3. their respective Maturity
    4. their respective Criticality
    5. Expert persons: need and availability
    Speaker: Veronique
    Service List proposal
  • 14:40 15:00
    General Considerations about Piquet Services 20m
    Speaker: All
    Notes from Jamie, circulated after the meeting
  • 15:00 15:05
    Next meeting 5m