28-R-15 (CERN conferencing service (joining details below))
CERN conferencing service (joining details below)
firstname.lastname@example.org Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768
NB: Reports were not received in advance of the meeting from:
ROCs: South West Europe
VOs: ATLAS; Alice; CMS
Recording of the meeting
Feedback on last meeting's minutes
<big> Grid-Operator-on-Duty handover </big>
From: DECH / Italy
To: CE / UK/I
Report from DECH COD:
One ticket escalated to the Ops meeting: GGUS #34400 (https://gus.fzk.de/ws/ticket_info.php?ticket=34400)
Site fails because SAM uses old lcg-utils. ROC_CERN promised already to upgrade but obviously did not manage so far.
Report from Italian COD:
Suggested that the site IL-IUCC (ROC-SEE) be suspended.
#36185 (SE, same pb., closed)
These tickets have been open for long time (early May) with no feedback. The site has been in downtime for a long time.
Not escalated last Friday: we COD tried to give suggestions on Thursday, then we had big problems at CNAF on Friday.
Checked today, no feedback.
<big> PPS Report & Issues </big>
Please find Issues from EGEE ROCs and general info in:
<big> Baseline versions of Storage Middleware </big>
This is a list of the versions currently supported by the Grid Storage Systems Developers. We also outline the recommended version to have installed.
2.1.7-10 will be released this week
Tier1s are recommended to upgrade faranno l'upgrade verso meta' Luglio
2.1.8 will be released the first week of August
Tier0 will upgrade within the end of August
Tier1 will follow
Current recommended version is 1.3-27 on SLC3
Recommended version is 2.7-1 on SLC4 as soon as released.
For Castor core support is granted for 2.1.n and 2.1.[n-1] where n is the version currently installed at Tier-1s. However, as soon as Tier-1s will move to 2.1.7, then 2.1.6 will not be supported any longer.
For CASTOR SRM, 2.7-n and 1.3-27 will be supported till new announcement.
Current version is 1.8.0-15p6 which fixes an essential bug with caching credential produced through grid-proxy-init.
Patch release 7 is about to come out. It fixes a problem with checksum verification when copy a file in push mode between 2 dCache sites.
1.8.0-15p7 is the recommended version as soon as it is out (in the next days).
Recommended and supported version is 1.3.20 on SLC4.
Recommended and supported version is 1.6.10 on SLC4.
<big> WLCG Operational Review </big>
Harry Renshall / Jamie Shiers
<big> Alice report </big>
<big> Atlas report </big>
<big> CMS report </big>
<big> LHCb report </big>
LHCb is running its DC06 activity at full regime. Stripping activity pointed out
issues at RAL (still jobs cannot get there after the incident with the CMS
rogue user killing the system with 10K jobs at once), IN2p3 (data access problem with gsidcap, Philippe sent all information through the GGUS ticket open for debugging their SRM/SE system), CNAF is in downtime, SARA: failures uploading output data.
CIC broadcast information: received one for IN2p3, reason:
queues drained (which is not very useful and informative) and one for GRIF:
scheduled intervention, reason: unexpected hardware failure (but how can then be the intervention scheduled?).