Operations meeting

Europe/Zurich
CERN

CERN

Nick Thackray
Description
Phone: +41 22 767 7000 \"Grid Operations/Ian Bird\"
actionlist

Weekly Operations' Meeting 2005-05-09

Agenda: http://agenda.cern.ch/fullAgenda.php?ida=a045866
Operations' Manual: http://lcgdeploy.cvs.cern.ch/cgi-bin/lcgdeploy.cgi/lcg-docs/EGEE-CIC-Operational-Manual/opMan.pdf
CA rpm instructions: http://cern.ch/grid-deployment/lcg2CAlist.html and https://cic.in2p3.fr/index.php?id=rc&subid=rc_config
Template for weekly report submission by the ROC: https://cic.in2p3.fr/index.php?id=roc&roc_page=1

Participants:
Asia Pac.:  Min Tsai
CE:              Marcin Radecki 
CERN:        Maria Dimou (Secretary), Laurence Field, Ian Neilson, Piotr Nyczyk, Andrea Sciaba, Nick Thackray (chairman).
CH/DE:      Sven Hermann
FR:              Gilles Mathieu, Rolf Rumler
IT:                Alessandro Cavalli
NE:              
SEE:           Kostas Koumantaros, Ognjen Prjnat
SWE:           Gonzalo Merino
Russia:       
UK/IR:       Philippa Strange, Steve Traylen.
Apologies: 
Absences: Nobody joined from NE or Russia.

Comments on the notes of the last meeting:
Not discussed.

CIC & ROC reports:

Exiting CIC-on-duty (UK):
A full report is in the CIC-on-duty log, starting as [2005-05-09 09:27] - philippa strange

ROC reports must be uploaded to this meeting's agenda as "More Information" by the CIC-on-duty and the ROC managers themselves by 11am CET.

The New report template, written by Osman Aidel, is linked from the ROC views of the CIC web site. Gonzalo complained about sites offered to him on the form that don't belong to their federation. This will be improved soon. Gilles said that non-ROC mgr or deputy should be allowed to fill form if delegated staff. Every federation should send him the list of names who should be allowed to fill the form.

Issues discussed in addition:

R-GMA: The Site Functional Tests (SFTs) were used by Laurence to see what are the common problems with R-GMA. Few common easy-to-solve problems were found and Laurence sent email to the rollout list with offer to help with any R-GMA-related problem sites may still have. Only 9 sites show better results, despite the fact that Laurene received no response to his offer. ROCs, please, follow-up with your sites to make correct R-GMA installations. From next week onwards R-GMA tests will become critical.

Sven said that for CH/DE (50%) the percentage of working R-GMA sites in the region must be higher. 8 total sites, 3 show a job submission failing, this should be classified as a global site problem, not a R-GMA one. Laurence disagreed with this. He said that the fact that a job can't be submitted to check whether R-GMA is working, means for him that R-GMA is not working.

The "freedom of choice" tool allows VOs to choose their sites. This tool is meant for VO managers and opens access for registered DNs. People who think they should be able to open the URL, please email Judit.Novak@cern.ch

SWE: GGUS allows to list problems per ROC but not per site within the region. Write to egee-ggus-feedback@cern.ch with your suggestion.

SEE: GGUS announcement this morning about a planned maintenance stoppage wednesday 4-9am should be announced in more detail and the stoppage should be shorter. Given that this is now a tool we use to run our operational service, such stoppages should not be frequent.

LCG2 2_4_0 Status:

The problem with sites which don't upgrade still stands. Kostas said in the 2_4_0 yaim, Info Provider had some bugs, whcih were quickly fixed but delayed the upgrades. One site per region should install the release to help others in the region.

What to do with sites on old release versions? The CIC-on-duty or relevant ROC should change the site's status to suspended in GOCdb. As an example, nothing happened to the HPBR site (decided to suspend in AOB on 2005-04-25). The production status field is available in GOCdb2, which is not yet used. Steve will do the delayed suspension of HPBR.

Ian asked what the procedure is to re-integrade a site to the Grid. Philippa suggested asking them to through a site re-registration process. Another possibility would be to leave it to the discretion of the CIC-on-duty staff, given that, while a site is suspended, SFTs can't run there and the site manager should take off-line contact with the CIC-on-duty and the Operations' team. Rolf said site add/remove procedures could be discussed in Bologna at the LCG Workshop in the week of May 23rd. ACTION 2005-05-09--1

NB! All CICs, please, re-visit the on-duty rotation agenda.

Security report (Ian Neilson):
- Ian received a question from GGUS maintainers about where to send ticket for the security category. He will probably make the list of security contacts available for uch assignements. 
- CA list v. 0.29 is out since 10 days. Here is the relevant announcement:

From davidg@nikhef.nl Wed May 11 16:09:04 2005
Date: Sat, 30 Apr 2005 17:03:05 +0200
From: David Groep 
To: EUGridPMA Announcements 
Cc: EUGridPMA Discussion List ,
    Joint Security group ,
    "Mailing list for integration activities of EGEE (Middleware)"
    ,
    EGEE middleware testing ,
    DEISA Security Contact ,
    JRA3 ,
    Maarten LITMAATH 
Subject: EUGridPMA new accredited CA distribution (version 0.28)

Dear CAs, Relying Parties, Users, and all others interested,

Release 0.29 of the CA distribution available
---------------------------------------------
A new distribution of Accredited Authorities by the EUGridPMA, release
version 0.29, is now available for download from the EUGridPMA Repository

     https://www.eugridpma.org/distribution/current/

You can download the new packages and install them at your convenience.

Changes from 0.28 to 0.29
-------------------------
(27 April 2005)

* New root certificate for the NIIF/Hungarnet CA, following the TACAR update
* Preliminary inclusion of the SWITCH CA certificates. Note that the
   ordering of the components in the end-entity DN will currently prevent
   the end-entity certs to be validated (this is being addressed by SwissSign)
* Modified layout of the tar distribution, in preparation for support of
   multiple authentication profiles

Note also that from this release on the (expired) DOESG root CA has been
withdrawn from the "accredited/" directory.

For those using RPM based linux distribution, a "meta-RPM" is available
from the repository, ca_policy_eugridpma-0.29-1.noarch.rpm, that contains
dependencies on the RPMs of all accredited CAs. The repository is
suitable for "yum" based automatic updates.

The next release (0.30) of the CA RPMs is to be expected around July 2005,
(of course barring special circumstances).

	Regards,
	David Groep
	Chair.

PS: Please circulate this announcement widely as appropriate.


So far, we were waiting for sites to upgrade to LCG2 2_4_0 but next week, people will be asked to upgrade and will be given 3 weeks.

Pre-production service status:
Participating sites: CERN, NIKHEF, CNAF, PIC, CESGA. SFTs are now ported to gLite. This week the pre-production service will be up and running. Access will be open to all DTEAM members.

 

Next CIC-on-duty:

  • The CIC-on-duty this week is IN2P3.
  • The actual work must be done by the ROCs and the follow-up by the CICs. The CIC-on-duty just does the monitoring.
  • The ROC managers should remember that when a problem at a site is solved the site needs to go through a period of quarantine before joining production again. This is mentioned in the Operations' Manual.
  • The CIC site (requires personal certificate loaded) is http://cic.in2p3.fr .
  • Link to the daily Site Status reports.
  • Savannah should be used for the Tasks' hand-over and follow-up. The Operations' Manual should be the basis.
  • When the CICs report also as ROCs, they should do this in separate reports.
  • Sites' status of updates can be checked in the daily reports e.g. http://cern.ch/lcg-testzone-reports/cgi-bin/lastreport.cgi

Action List now in a separate file.

It wasn't discussed at this meeting.

A.O.B.

Alessandro complained about slow response of SFT pages. Piotr will move the site to a web server where neither afs nor cern web services will cause delays. Moreover, certificate identification will be implemented so SFT access will become more secure.

Next meeting:

  • The next regular telephone meeting will take place on Monday May 23rd 2005 from 14:00 to 16:00 in conference room 28-R-015.

Maria Dimou, IT/GD, Grid Infrastructure Services


There are minutes attached to this event. Show them.