28-R-15 (CERN conferencing service (joining details below))
28-R-15
CERN conferencing service (joining details below)
Nick Thackray, Steve Traylen
Description
grid-operations-meeting@cern.ch Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
GGUS representatives
VO representatives
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768
NB: Reports were not received in advance of the meeting from:
ROCs: All ROC reports received.
VOs: Alice, ATLAS, CMS, BioMed
list of actions
Minutes
Recording of the meeting
16:00
→
16:05
Feedback on last meeting's minutes5m
16:01
→
16:30
EGEE Items29m
<big> Grid-Operator-on-Duty handover </big>
From: ROC France / ROC SouthEast Europe
To: ROC Asia Pacific / ROC DECH
NB: Please can the grid ops-on-duty teams submit their reports no later than 12:00 UTC (14:00 Swiss local time).
Issues from France COD team::
2 sites are expected to attend the meeting.
INFN-MILANO (ROC Italy): No anwser from site and no progress (https://gus.fzk.de/pages/ticket_details.php?ticket=27659)
PEARL-AMU (ROC Central Europe): A long problem due to network connectivity (https://gus.fzk.de/pages/ticket_details.php?ticket=25346)
Last answer from the site:
Dear all,
Since all our efforts in situation remediation have failed I have requested AMU authorities for Network correction for the pagaj SE host. This will include changing of the subnet and network route for the host what will, hopefully, resolve the connectivity problem for our site.
Issues from SouthEast Europe COD team::
No major issues to report
<big> PPS Report & Issues </big>
PPS reports were not received from these ROCs:
Italy, AP Issues from EGEE ROCs:
At Cyfronet there was a failure of main switch for clusters systems. Both preproduction and production services (including SAM UI) where unavailable for about 24h. [CE ROC]
Release News:
gLite 3.1.0 PPS Update08 was release to PPS and it is currently undergoing the pre-deployment testing
This update contains (among other patches) the new service:
lcg-CE for SLC4
Due to the fact that lcg-CE requires the latest version of glite-yaim-core
(patch #1413), new versions of yaim client packages (patch #1415) need tobe released to PPS as well, namely:
glite-yaim-clients-4.0.1-1.noarch.rpm
glite-yaim-torque-client-4.0.1-1.noarch.rpm
as well as the following new metapackages, which were affected:
PPS-glite-TORQUE_client-3.1.0-5.noarch.rpm
PPS-glite-UI-3.1.0-8.noarch.rpm
PPS-glite-WN-3.1.0-8.noarch.rpm
<big> New tool for announcing (and receiving notification of) down-time </big>
There is a new tool in the CIC Portal for announcing site and service downtimes. Features of the tool are:
Uses standardized templates so all announcements will look similar (easier to scan) and all relevant information will be captured (no missing information)
The template will include a more targeted set of recipients of a broadcast (spam reduction)
You can subscribe to an RSS feed of messages (by type) rather than receiving them in your inbox (spam reduction)
Speaker:
CIC Portal team
<big> EGEE issues coming from ROC reports </big>
[NE] When will the SL4 32-bit lcg-CE be released?
[NE] We have submitted a GGUS ticket about a problem with GStat (27724) which has been in the "assigned" status since oktober 9th. When will somebody take care of this? Details of ticket are: At the moment SARA-MATRIX has the following warning in GStat:
Missing DN and Attributes:
==================
IN: 'dn: GlueSALocalID=dteam:DTEAM_RAW,GlueSEUniqueID=ant2.grid.sara.nl,mds-vo-name=SARA-MATRIX,o=grid'
'GlueSARoot: .+:.+' ()
etc.
However, the use of SARoot is already deprecated in Glue version 1.2. So this test is wrong.
[France ROC, CGG-LCG2] There is no automatic procedure to clean up the $EDG_WL_SCRATCH and the MPI execution directory
[France ROC, GRIF] Request for a SAM tests history of 7 days at least.
[SE Europe ROC] I've noticed some discrepancies betwenn ggus and cic portal dashboard, some PPS sites appear in production view in dashboard.
<big> Move of 'default' CERN AFS UI from gLtie 3.0 to gLite 3.1</big>
16:30
→
17:00
WLCG Items30m
<big> Tier 1 reports </big>
Item 1
<big> WLCG issues coming from ROC reports </big>
None this week.
<big>WLCG Service Interventions (with dates / times where known) </big>
Problem of the VOView consistency. Signaled 3 weeks ago, still 90 queues have problems. The new list under the usual http://voatlas01.cern.ch/atlas/data/VOViewProblem.log
Last week we had at CNAF problem due to the shared area not working. The problem was related to the migration of the shared areas to GPFS. This suggestes that any important changes in site configuration should be always broadcasted at a high level.
Speaker:
Drroberto santinelli(CERN/IT/GD)
<big> ALICE service </big>
Item 1.
Speaker:
DrPatricia Mendez Lorenzo(CERN IT/GD)
<big> WLCG Service Coordination </big>
WLCG Service Reliability workshop, CERN, November 26 - 30 - agenda - wiki
Common Computing Readiness Challenge - CCRC'08 - meetings page
ATLAS throughput tests finished and M5 detector cosmics now running till 5 November. Data export from CERN later in the week.