WLCG-OSG-EGEE Operations meeting
→
Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))
28-R-15
CERN conferencing service (joining details below)
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
GGUS representatives
VO representatives ROCs: South West Europe
VOs: ATLAS; Alice; CMS
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768
OR click HERE
NB: Reports were not received in advance of the meeting from:
-
-
16:00
→
16:01
Feedback on last meeting's minutes 1m
-
16:01
→
16:30
EGEE Items 29m
-
<big> Grid-Operator-on-Duty handover </big>From: DECH / Italy
To: CE / UK/I
Report from DECH COD:- One ticket escalated to the Ops meeting: GGUS #34400 (https://gus.fzk.de/ws/ticket_info.php?ticket=34400)
Site fails because SAM uses old lcg-utils. ROC_CERN promised already to upgrade but obviously did not manage so far.
- Suggested that the site IL-IUCC (ROC-SEE) be suspended.
#37110 (SRM)
#36185 (SE, same pb., closed)
#36262 (CE)
These tickets have been open for long time (early May) with no feedback. The site has been in downtime for a long time.
Not escalated last Friday: we COD tried to give suggestions on Thursday, then we had big problems at CNAF on Friday.
Checked today, no feedback.
- One ticket escalated to the Ops meeting: GGUS #34400 (https://gus.fzk.de/ws/ticket_info.php?ticket=34400)
-
<big> PPS Report & Issues </big>Please find Issues from EGEE ROCs and general info in:
https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps -
<big> gLite Release News</big>
Release News:
Please find gLite release news in:
https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases -
<big> EGEE issues coming from ROC reports </big>
- [SEE ROC]: Information: TAU-LCG2 is now closed, due to poor site administration and availability.
- [SEE ROC]: Information: TAU-LCG2 is now closed, due to poor site administration and availability.
-
-
16:30
→
17:00
WLCG Items 30m
-
<big> WLCG issues coming from ROC reports </big>
- None in the reports.
-
<big>WLCG Service Interventions (with dates / times where known) </big>Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
- Due to network maintenance SARA's 3D database, saradb, will be unavailable on 30/06/2008 starting 16:00 UTC until
18:00 UTC.
Time at WLCG T0 and T1 sites.
- Due to network maintenance SARA's 3D database, saradb, will be unavailable on 30/06/2008 starting 16:00 UTC until
18:00 UTC.
-
<big> Baseline versions of Storage Middleware </big>This is a list of the versions currently supported by the Grid Storage Systems Developers. We also outline the recommended version to have installed.
CASTOR
Core- 2.1.7-10 will be released this week
- Tier1s are recommended to upgrade faranno l'upgrade verso meta' Luglio
- 2.1.8 will be released the first week of August
- Tier0 will upgrade within the end of August
- Tier1 will follow
- Current recommended version is 1.3-27 on SLC3
- Recommended version is 2.7-1 on SLC4 as soon as released.
For Castor core support is granted for 2.1.n and 2.1.[n-1] where n is the version currently installed at Tier-1s. However, as soon as Tier-1s will move to 2.1.7, then 2.1.6 will not be supported any longer.
For CASTOR SRM, 2.7-n and 1.3-27 will be supported till new announcement.
dCache
Current version is 1.8.0-15p6 which fixes an essential bug with caching credential produced through grid-proxy-init. Patch release 7 is about to come out. It fixes a problem with checksum verification when copy a file in push mode between 2 dCache sites. 1.8.0-15p7 is the recommended version as soon as it is out (in the next days).
StoRM
Recommended and supported version is 1.3.20 on SLC4.
DPM
Recommended and supported version is 1.6.10 on SLC4. - 2.1.7-10 will be released this week
-
<big> WLCG Operational Review </big>Speaker: Harry Renshall / Jamie Shiers
-
<big> Alice report </big>
-
<big> Atlas report </big>
-
<big> CMS report </big>Speaker: Daniele Bonacorsi
-
<big> LHCb report </big>LHCb is running its DC06 activity at full regime. Stripping activity pointed out issues at RAL (still jobs cannot get there after the incident with the CMS rogue user killing the system with 10K jobs at once), IN2p3 (data access problem with gsidcap, Philippe sent all information through the GGUS ticket open for debugging their SRM/SE system), CNAF is in downtime, SARA: failures uploading output data.
- CIC broadcast information: received one for IN2p3, reason: queues drained (which is not very useful and informative) and one for GRIF: scheduled intervention, reason: unexpected hardware failure (but how can then be the intervention scheduled?).
-
- 17:00 → 17:30
- 17:30 → 17:35
-
17:35
→
17:36
AOB 1m
-
16:00
→
16:01