1–3 Mar 2006
CERN
Europe/Zurich timezone

User and virtual organisation support in EGEE

2 Mar 2006, 14:25
20m
40-S2-A01 (CERN)

40-S2-A01

CERN

Oral contribution VO management - Portals 2d: VO tools - Portals

Speaker

Flavia Donno (CERN)

Description

User and virtual organisation support in EGEE Providing adequate user support in a grid environment is a very challenging task due to the distributed nature of the grid. The variety of users and the variety of Virtual Organizations (VO) with a wide range of applications in use add further to the challenge. The people asking for support are of various kinds. They can be generic grid beginners, users belonging to a given Virtual Organization and dealing with a specific set of applications, site administrators operating grid services and local computing infrastructures, grid monitoring operators who check the status of the grid and need to contact the specific site to report problems; to this list can be added network specialists and others. Wherever a user is located and whatever the problem experienced is, a user expects from a support infrastructure a given set of services. A non-exhaustive list is the following: a) a single access point for support; b) a portal with a well structured sources of information and updated documentation concerning the VO or the set of services involved; c) experts knowledgeable of the particular application in use and who can even discuss with the user to better understand what he/she is trying to achieve (hot- line); help integrating user applications with the grid middleware; d) correct, complete and responsive support; e) tools to help resolve problems (search engines, monitoring applications, resources status, etc.); f) examples, templates, specific distributions for software of interest; g) integrated interface with other Grid infrastructure support systems; h) connection with the grid developers and the deployment and operation teams; i) assistance during production use of the grid infrastructure. With the Global Grid User Support (GGUS) infrastructure, EGEE attempts to meet all of these expectations. The current use of the system and the user satisfaction ratings have shown that the goal has been achieved with a certain success for the moment. As of today GGUS has shown to be able to process up to 200 requests per day and provides all above listed services. In what follows we discuss the organization of the GGUS system, how it meets the users’ needs, and the current open issues. The model of the existing EGEE Global Grid User Support (GGUS) is as follows. The support model in EGEE can be captioned "regional support with central coordination". Users can submit a support request to the central GGUS service, or to their Regional Operations' Center (ROC) or to their Virtual Organisation (VO) helpdesks. Within GGUS there is an internal support structure for all support requests. The ROCs and VOs and the other project wide groups such as middleware groups (JRA), network groups (NA), service groups (SA) and other grid infrastructures (OSG, NorduGrid, etc.) are connected via a central integration platform provided by GGUS. GGUS central helpdesk also acts as a portal for all users who do not know where to send their requests. They can enter them directly into the GGUS system via a web form or e-mail. This central helpdesk keeps track of all service requests and assigns them to the appropriate support groups. In this way, formal communication between all support groups is possible. To enable this, each group has built an interface (e-mail and web front-end, or interface between ticketing systems) between its internal support structure and the central GGUS application. In the central GGUS system, first line support experts from the ROCs and the Virtual Organizations will do the initial problem analysis. Support is widely distributed. These experts are called Ticket Processing Managers (TPM) for generic first line support (generic TPM) and for VO specific first line support (VO TPM). These experts can either provide the solution to the problem reported or escalate it to more specialized support unit that provide network, middleware and grid service support. They may also refer it to specific ROCs or VO experts. Behind the specialized VO TPM support units, people belonging to EGEE/NA4 groups such as the Experiment Integration Support group (EIS) help VO users with on-line support and the integration of the VO specific applications with the grid middleware. Such people can also recognize if a problem is application specific and forward the problem to more VO specific support units connected to GGUS. TPM and VO TPMs have also the duty of following tickets, making sure that users receive an adequate answer, coordinating the effort of understanding the real nature of the problem and involving more than one second level support unit if needed. The following figure depicts the ticket flow. To provide appropriate user support, the distributed structure of EGEE and the VOs has to be taken into account. The community of supporters is therefore distributed. Their effort is coordinated centrally by GGUS and locally by the local ROC support infrastructures. The ROC provides adequate support to classify the problems and to resolve them if possible. Each ROC has named user support contacts who manage the support inside the ROC and who coordinate with the other ROCs’ support contacts. The classification at this level distinguishes between operational problems, configuration problems, violations of service agreements, problems that originate from the resource centres and problems that originate from global services or from internal problems in the software. Problems that are positively linked to a resource centre are then transferred to the responsibility of the ROC with which the RC is associated. MEETING USER NEEDS As explained above, GGUS provides therefore a single entry point for reporting problems and dealing with the grid. In collaboration with the EGEE EIS team, the EGEE User Information Group, NA3, and the entire EGEE infrastructure, GGUS offers a portal where users can find up-to-date documentation, and powerful search engines to find answers to resolved problems and examples. Common solutions are stored in the GGUS knowledge database and Wiki pages are compiled for frequent or undocumented problems/features. GGUS offers hot lines for users and supporters and a VRVS chat room to make the entire support infrastructure available on-line to users. Special tools and grid middleware distributions are made available by the NA4/EIS team for GGUS users. GGUS is interfaced with other grids’ support infrastructures such as in the case of OSG and NorduGrid. Also, GGUS is used for daily operations to monitor the grid and keep it healthy. Therefore, specific user problems can be directly communicated to the Grid Operation Centers and broadcasted to the entire grid community. GGUS is used also to follow and track down problems during stress testing activities such as the HEP experiments production data challenges and the service challenges. OPEN ISSUES Even-though GGUS has proven to provide useful services, there are still many things that need improvement. Concerning users and VOs, in particular, we have identified the following: Small VOs do not have the resources to implement their part of the model The large VOs such as the LHC experiments have people who provide support for the applications which the VO has to run as part of its work. These people are contacted by GGUS when tickets are assigned to the VO or then the problem needs immediate or on-line attention. It has proven difficult for some of the small VOs to provide such a service. In this case, GGUS still provides support for the VO, but if the problem is application related and cannot be resolved, then it has to be put into the state ‘unsolvable’. Supporters have other jobs to do In EGEE, almost everyone providing support does so as part of their job. It is not usually a major part of their job. Some times it is difficult to ensure responsiveness. There is a small team which maintains and develops the GGUS system. Supporters are concentrated in a few locations The resources of the grid are widely distributed over 180 locations, and there are people in all of these locations looking after the basic operation of the computers. However this is not the case for higher level support such as support for a VO application. This tends to exist in only a small number of locations, with a small number of supporters. Scalability is constrained by the availability of supporters The number of people who can provide support for basic operations is large, but the number of people who can provide support for higher level services is small. As the VOs become larger this will become a constraint to growth unless more supporters are found. Limited experience in handling a large number of tickets As part of the development of the GGUS system, it has been exercised by generating tickets. As the system is built from industry standard software parts using Remedy and Oracle, it has been found to be reliable. We believe however that if large numbers of tickets are submitted that it will show the limitations in the system. Limited engagement of existing VOs in the implementation of GGUS There is an organisation within EGEE called Executive Support Committee (ESC). The ESC has representatives from all of the ROCs of EGEE. This organisation meets once per month by telephone to discuss the operations and development of the support system and to decide on actions and priorities for the work. The present VOs have found it difficult to provide people for involvement with this work. CONCLUSION The GGUS system is now ready for duty. During 2006, it is expected that there will be a large number of tickets passing through the system as the LHC VOs move from preparing for service to being in production. It is also expected that the number of Virtual Organisations will grow as the work of EGEE-II proceeds. There will also be an increase in the number of support units involved with GGUS, and an increase in the number of ROCs and RCs. Acronyms EGEE Enabling Grids for E-sciencE EIS Experiment Integration Support GGUS Global Grid User Support HEP High Energy Physics JRA Joint Research Activity of EGEE LHC Large Hadron Collider NA Network Activity OSG Open Science Grid RC Resource Centre ROC Regional Operations' Centre SA Service Activity TPM Ticket Process Management VO Virtual Organisation VRVS Virtual Rooms Videoconferencing System Wiki Web technology for collaborative working

Authors

Presentation materials