Agenda: http://agenda.cern.ch/fullAgenda.php?ida=a045854
Operations' Manual: http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/lcg-docs/EGEE-CIC-Operational-Manual/opMan.pdf
Template for weekly report submission by the ROC: http://cern.ch/egee-docs/list.php?dir=.\operational_tools\&
Reports are now submitted by the CIC-on-duty and the ROCs according to this template. Helene Cordier (IN2P3) suggested Nick Thackray to link it from the CIC web site. Maria Dimou (CERN) suggested to also link it from the meeting agenda.
Reports should be sent to the project-egee-roc-managers@cern.ch list by 11am CET. They are linked from the meeting agenda. CERN submitted no report again due to the EGEE review.
Issues discussed in addition:
- INFN was the CIC -on-duty last week. CERN took over the operation during the period 9-11 February for demonstration reasons at the EGEE review. Alessandro Cavalli (INFN), reported too many job failures (proxy expiring before the job reaches its turn to run) for reasons that are not fully understood. They suggest the creation of a faster queue accepting only short jobs.
- Frederic Schaer (IN2P3) suggested that the GOC monitoring jobs shouldn't be re-submitted if they fail or still run (24 hours is the maximum running time). David Kant said they will implement this functionality in the coming release in March. A possible solution would be to make one RB per VO (the one of DTEAM would handle the monitoring jobs). Judit Novak is currently working on scripts that will extract the BDII information out of GOC db. Min's giis service page shows the sites supporting a given VO, e.g. http://goc.grid.sinica.edu.tw/gstat/service.html#dteam, provided the Information System contains correct information.
- Piotr Nyczyk (CERN) reported problems to understand the exact status of site Fraunhofer ITWM. It is in the GOC database, it is being monitored but it is not in the BDII configuration file which is a prerequisite status for a 'certified' site according to the Site Registration Requirements document.
- Steve Traylen (RAL) asked when the SE clean-up procedure will be ready. This will be combined with GGUS procedures expected by mid-March 2005 (*** ACTION 2005-02-07--1 ***).
- Sven Hermann (FZK) suggested that if a site wishes to stop being monitored, it has to change the value of the monitoring flag in GOC db.
- Min Tsai (Taiwan) said there is nothing to report from that site's operation, due to the Chinese New Year, last week.
- The term 'non-functional site' will be used instead of 'bad site'. Criteria to classify it non functional include:
Security report (Ian Neilson):
- Nothing to report.
Next CIC-on-duty:
(*** ACTION 2004-11-29--1 ***)Vincent Breton to check with the other application managers the "Migration to SLC3 plan" of non-HEP VOs. LHC experiments are, up to now, mid February 2005, still not clear about the status of their software. OPEN
(*** ACTION 2004-11-29--5 ***) Steve Traylen, with help from Laurence Field to document on the Wiki page for the sites on how to block users, when necessary. DONE on 2005-01-17 Here is the FAQ entry CLOSE at next meeting.
(*** ACTION 2004-12-06--1 ***) Sites should accelerate migration to SLC3, at least on the service nodes due to security considerations. OPEN
(*** ACTION 2004-12-13--1 ***) Escalation procedures in the Operations' Manual should clarify that sites running outdated LCG2 versions or don't respond to CIC-on-duty action prompt will be disclosed in this meeting. OPEN
(*** ACTION 2004-12-13--2 ***) Savannah accounts were created for all the ROCs. New users had to activate these accounts but as some failed to do it in time, their accounts have now expired. Now A. Kryukov has to re-create the Russian ROC account(s). DONE CLOSE at next meeting.
(*** ACTION 2004-12-13--4 ***) Nick Thackray to send email to the 3 relevant ROCs about Helene Cordier's reminder to UK, IT and RU to submit their test suites as agreed in The Hague. DONE CLOSE at next meeting.
(*** ACTION 2005-01-03--1 ***) Grid Infrastructure Support Section to give generic (functional) DNS aliases to important service nodes, ensuring transparent service changes. OPEN
(*** ACTION 2005-01-10--1 ***) Frederic Schaer will enter a bug in savannah, project=lcgoperation, requesting a proxy at site level protecting from asynchronous updates of the CA list on various nodes at a site (RB, WNs etc) that now cause credential verification failure. DONE CLOSE at next meeting.
(*** ACTION 2005-01-10--2 ***) Laurence Field or Markus Schulz to prototype a dump of the GOC database every few hours and provide a read-only copy for the community. DONE in Taiwan. CLOSE at next meeting.
(*** ACTION 2005-01-10--3 ***) The asian sites' support needs to be formalised. Requested by CERN at the relevant CIC-on-duty report. Assigned to Taiwan, the asian sites' ROC. OPEN
(*** ACTION 2005-02-07--1 ***) Nick & Flavia to send the procedure to be used by sites when they need to inform a VO that the SE fills-up to Piotr for publication in the Operations Manual. OPEN
(*** ACTION 2005-02-07--2 ***) Gilles Mathieu and Markus to put the web page with CA rpms, currently under http://cern.ch/markusw, on the CIC web site. OPEN
(*** ACTION 2005-02-07--3 ***) GOC database manager David Kant to add an operational days field, in addition to operational hours, e.g. Operational hours: 0900 - 1700 (GMT), for every site. Sites to correct their timezone in GOC db. ROC managers should do the same, each on their page under http://cern.ch/egee-sa1/ROC-support.htm. OPEN
(*** ACTION 2005-02-07--4 ***) Nick, on request from John Gordon and using some information circulated by Ian Bird to define metrics on site performance. OPEN
(*** ACTION 2005-02-14--1 ***) Min Tsai (CERN) noticed that the CNAF LCG2 Release shown in the monitoring tools is 2_2_0. This is an editing mistake and it should be corrected by CNAF. OPEN
(*** ACTION 2005-02-14--2 ***) ROCs to make sure that sites in their region which are in the GOC database, also appear in the BDII configuration file which is a prerequisite status for a 'certified' site according to the Site Registration Requirements document. OPEN
(*** ACTION 2005-02-14--3 ***) Nick to clarify the future release strategy. Ian Bird presented different plans at the EGEE review to the ones that Maarten Litmaath published in email. OPEN
(*** ACTION 2005-02-14--4 ***) Min to add the colour code to the Regions' page as well. OPEN
(*** ACTION 2005-02-14--5 ***)Participants requested a historical site view, i.e. one that shows how logn a given site has been out last week. Piotr? OPEN
Maria Dimou, IT/GD, Grid Infrastructure Services