28-R-15 (CERN conferencing service (joining details below))
CERN conferencing service (joining details below)
email@example.com Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768
NB: Reports were not received in advance of the meeting from:
ROCs: France, Russia, SWE
VOs: NO VO REPORTS WERE RECEIVED THIS WEEK
list of actions
Feedback on last meeting's minutes
<big> Grid-Operator-on-Duty handover </big>
From: France / UK-I
To: CERN/ CE
*Difficulties to reach SAM web interface sometimes *
ANSWER: For the moment SAM portal is quite functional. Judit is now working to use better SQL queries in SAM portal.
*Last escalation step reached for site YerPhI*
=> answer from site :
The SAM tests failures are caused by known issue in the SE software (https://gus.fzk.de/pages/ticket_details.php?ticket=30752). We have been advised to upgrade to the latest certified version. Currently we are trying to do that.
<big> PPS Report & Issues </big>
PPS reports were not received from these ROCs:
AP, FR, IT, RU, SEE, SWE
Re-organisation of the PPS:
An activity of re-organization of the PPS is in progress in the aim of:
making the service more suitable for use by the HEP VO
extending the scope of the pre-deployment testing
In this context a spreadsheet was edited and distributed to the PPS sites, containing a synthetic inventory of services and activities currently run within PPS.
The Service Inventory, based on information available on the GOC DB, can be found in www.cern.ch/pps/index.php?dir=./site/
PPS sites and EGEE ROCs are kindly invited to provide feedback and corrections to the spreadsheet within next week. The info collected will be used later on as an assessment point for the re-organisation .
The contact point to be used for feedback is the list firstname.lastname@example.org
Issues from EGEE ROCs:
ROC CE: There is a possible bug in latests lcg_utils (lcg_util-1.6.8-1.slc4). See https://gus.fzk.de/pages/ticket_details.php?ticket=33262
All three PPS sites from ROC CE have problems with lcg-cr.
Glite 3.1.0 PPS Update 19 was released to production and it is now in pre-deployment testing
WN 3.1 for sl4 64bits
lcg-vomscerts-4.8.0 adds next cert for biomed + egeode
A new update, gLite3.1.0 PPS Update20 is in preparation.
This Update will introduce the MONBOX on the 3.1 baseline (for SLC4)
<big> EGEE issues coming from ROC reports </big>
(ROC DECH): GGUS ticket because of JS failing SAM. Reason was that the test jobs hit the resource limit of the queue. SAM needs to be submissioned with proper requirements. (DESY-HH)
(ROC DECH): Problems with "larger" input sand boxes on WMS, where larger means 10MB or more. The WMS is configured to accept up to 100MB. Jobs go in running state but the input sand box does not arrive on the WN.
GGUS ticket: 33136 (DESY-HH)
(ROC DECH): Problems with publishing accounting data: APEL claims that data are missing since quite some time (Oct 2007), actually we have published already since then, perhaps not all. The big amount of accounting records can then not be published failing with:
Exception in thread "main" java.lang.OutOfMemoryError.
We are getting tired spending every other week a lot of effort in this business. (DESY-HH)
** Has this been reported through GGUS? **
a) I was aware of long-standing discussion between DESY and Dave but I cannot see an open GGUS ticket.
b) DESY run multiple CEs and there have been problems with such sites but we have successfully got most of them running. We are also working with YAIM people to make configuring multiple CEs easier.
c) The problem of catching up a large number of job records is recognised and a solution is being researched (but see d) Meanwhile Cristina has a manual method of helping sites catch up by inserting their data directly into the database. I am afraid it requires her manual intervention so it will have to wait. Following CERN's example I recommend sites which run a lot of jobs to publish more than once per day
d) The Gap Publisher allows a site to publish for a specified time interval. This is designed to help sites fill gaps when publishing failed but can also be used to reduce the number of records published at once and thus reduce memory problems.
(ROC UKI): Why is MON not yet supported on SL4? Seems odd as it is java based!
<big> gLite Release News</big>
An update to gLite (3.1 Update 15) will be released very soon (today) containing the new certificate of the VOMS server for the VOs biomed and egeode
<big>Support for gLite 3.0 services </big>5m
We plan to stop issuing updates for the following glite 3.0 services:
SERVICE3.1 - RELEASE DATE
glite-BDII - 21/11/07
lcg-CE - 12/11/07
glite-LFC_mysql - 14/12/07
glite-LFC_oracle - 14/12/07
glite-DPM_mysql - 14/12/07
glite-DPM_disk - 14/12/07
glite-TORQUE_server - 12/11/07
<big>pre-release version of the SAM web services</big>5m
As announced last Friday Feb. 22 through the same-announce mailing list,
there is a new pre-release version of the SAM web services
(lcg-sam-server-ws-0.11.0) installed on the SAM Validation instance.
All the information related to bugs fixed, configuration changes and validation portals available for testing is described at:
and in particular for this new RPM at:
People are encouraged to review these changes, adapt their code (if
necessary) and test the new interfaces as soon as possible.
<big>Glue 2.0 Draft</big>
Please find at the following URL the initial draft of Glue version 2.0 which will be shown at OGF 22.
If anyone has any comments or suggestions, please email them directly to me and I will merge them together to form a response from EGEE.
<big> Where to get dCache updates during CCRC '08
This is to clarify that during CCRC '08 WLCG sites should take dCache updates from the official dCache repositories (see http://www.dcache.org/ for details).
<big> WLCG issues coming from ROC reports </big>
None this week
<big>WLCG Service Interventions (with dates / times where known) </big>