grid-operations-meeting@cern.ch Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
GGUS representatives
VO representatives
VRVS "plane" room will be available 15:30 until 18:00 CET
actionlist
minutes
16:00
→
17:25
WLCG-OSG-EGEE Operations Meeting28-R-15
28-R-15
16:00
Feedback on last meeting's minutes5m
Minutes
16:05
EGEE Items20m
<big> Grid-Operator-on-Duty handover </big>5m
From Russia ROC (backup: Italy) to CERN ROC (backup: DECH ROC)
Tickets:
Open 55
Closed 30
2-mail 13
Modified 48
All 146
Notes:
No information on SFT PPS was enabling.
The dashboard was very unstable and did not refresh since Fri, 29 Sep 2006 16:16:19 +0200 till
now.
<big>Job priorities WG</big>10m
Summary of the the Job Priorities WG recommendations and deployment plans
Speaker:
Jeff Templon, Dietrich Liko
transparencies
<big> Move to the new version of FCR </big>5m
In migration to the new version of FCR the VOs should be reminded to apply their settings on the new version, as the old one will be phased out by 6th October, 2006. This is especially important because of the 'dteam' => 'ops' change. (Currently most VOs don't have a Critical Test set defined for 'ops').
VOs need to check that their settings in the new FCR tool are correct
Owners of top-level BDIIs which use FCR need to use the new LDIFF
Speaker:
Judit Novak
<big> Update on SLC4 migration </big>5m
The goal is to port the gLite components into the ETICS system by the end of October, which automatically means that they will be built on SLC3 (ia32) and SLC4 (ia32 and x86_64). In theory also builds on Debian and ia64 should be possible although the builds system on those systems has not been completely tested. Components are ported by subsystem with priority given to those required to have the UI and WN ready first. In any case components are built as they are available.
The corresponsing packages can be found in the ETICS repository: http://etics.cern.ch:8080/repositoryBrowser/
We will also work on some script to provide the package list in a form suitable for populating the gLite APT repository directly.
<big> summary on the status of the request to allow users to pass arguments to the underlying LRMS </big>5m
Speaker:
Alessandra Forti
<big> Savannah bugs to follow up </big>5m
bugs 17738 and 15746 (both GFAL): work will start in around 4 weeks time. Delay is because SRM 2.2 work has to be completed first. Need to give feedback if this is too long (with justification)
bug #17738: GFAL info system timeout too low
bug #15746: GFAL should optimize LDAP queries
bug 19878: work is currently due to start at beginning of December. Feedback should be given if this is too long (with justification)
bug #15878: DNs with "." are not properly handled
<big> EGEE issues coming from ROC reports </big>15m
Reports were not received from these ROCs: AP, SWE, UKI
Item 1 (NE ROC): A major concern for the Netherlands is the possible drop of support for VOMS-enabled Pre-WS GRAM on the gLite-CE. A number of the VOs that we support use Nimrod to submit jobs which works on Pre-WS (VOMS-enabled) GRAM. At least as long as Globus packages are in their toolkit. Also see remarks made for SARA-MATRIX site.
Item 2 (SEE ROC): 1) AEGIS
Yet again non-official and invalid SFT sent to our site by Rafal Lichwala
from SFT Admin Tool on 27-09-2006 10:51 is present in our CIC daily report.
While we don't mind having any regular jobs sent to our site through
supported VOs, CIC daily report should not contain such SFT failure. To make
the matters worse, this SFT failure is triplicated.
Item 3 (Italy ROC): The errors on the SFT tests for this day - marked as critical (CT) - we were not able to reproduce for a 'dteam' user. How would be the procedure to test as 'ops'? Should we ask to become member of the 'ops' VO?
(UKI ROC) The problem of the OPS failure with 3rd party replication is being investigated. It seems this is a very limited problem, affecting only this VO and only lxn1183.cern.ch as a remote SE. As the site has no one in the OPS VO to aid with testing it's very hard to debug this.
We suggest that at least one support person in each ROC be a member of the OPS VO to help sites with problems like this.
16:25
OSG Items5m
<big> Item 1 </big>5m
16:30
WLCG Items35m
<big> WLCG Service Commissioning report and upcoming activities </big>15m