WLCG-OSG-EGEE Operations meeting
28-R-15
CERN conferencing service (joining details below)
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0148141
OR click HERE
(Please specify your name & affiliation in the web-interface)
-
-
16:01
→
16:30
EGEE Items 29m
-
<big> Grid-Operator-on-Duty handover </big>From: SouthEast and Russia
To: CERN and Germany, Switzerland
Nothing to report from Germany or Switzerland. -
<big> PPS Report & Issues </big>Please find Issues from EGEE ROCs and general info in:
https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps -
<big> gLite Release News</big>
-
<big> EGEE issues coming from ROC reports </big>
- CentralEurope: There is a problem with CE RM test which fails in case the
site SE is in downtime. Central Europe ROC recommended site administrators to set SE and CE in downtime in case of need to put SE in downtime. Additionally CE-only sites should setup some other siteís SE as a close one and an agreement with site administrator of the SE owner site is needed. We want to ask does other ROCs have similar problems and how they solved it?
- Germany, Switzerland: UNI-BONN has problems with their APEL publisher. On 2009-01-13 Robert Zimmermann (UNI-BONN) raised a ticket and asked for help from the R-GMA experts. But until today no one of the R-GMA support unit reacted on that ticket:
GGUS:45231.
On 2009-01-19 then UNI-BONN got a ticket for failing APEL tests (GGUS no
GGUS:45405).
Comment from me (Steve), ticket was assigned to wrong group really but not obvious. RAL-LCG2 can correct the situation.
- Italy: Feedback about the new bdii release
(ref. GGUS:43230, Savannah PATCH:2671).
Before the update, we experienced random error messages by nagios: Could not search/find objectclasses in mds-vo-name=local,o=grid and by SAM tests: egee-bdii.cnaf.infn.it:2170: ERROR: Internal (implementation specific) error lcg_gt: Invalid argument)
All italian top-bdii instances were updated on 15th Jan. After the update, both error messages have been disappeared.
- Russia: GGUS:45333 was assigned to R-GMA team from 15 of January without any respond. As a result the BY-NCPHEP site can not operate properly.
Comments from chair (Steve) same situation as BONN site above. Have now reassigned to RAL-LCG2 for resolution. But generally I will contact R-GMA folks since they should have both on.
- SouthEastern: I got yet another report regarding the BDII stability issue.
Comment from chair (Steve). Please provide more information.
- SouthWestern: LIP complains that ATLAS is using 5 GB on the Workenodes /tmp directory. LIP s WN have 8 cores and most of the disk space is dedicated to the /home directories. Maybe it would be useful to know for all the LHC VOs the disk requiremets for /home and TMPDIR (the scratch space) pero job
Comment from chair (Steve) see the CIC portal VO cards. They contain exactly this information. Clearly if it does not reflect your observations raise a GGUS ticket.
- CentralEurope: There is a problem with CE RM test which fails in case the
site SE is in downtime. Central Europe ROC recommended site administrators to set SE and CE in downtime in case of need to put SE in downtime. Additionally CE-only sites should setup some other siteís SE as a close one and an agreement with site administrator of the SE owner site is needed. We want to ask does other ROCs have similar problems and how they solved it?
-
<big>SAM</big> 15mA new version of the SAM portal is available on the Validation server at:
https://sam-val.cern.ch:8443/sam-gw/sam.py
The portal solves the broken history display of previous versions by invoking the corresponding GridView pages. The Portal at the above URL points to the Production DB, since GridView has no Validation setup. Users are encouraged to have a look and give any feedback to judit.novak at cern.
-
<big>Operational Security</big> 15m
- SecOp at Beijing
- Change csirt email address
- CAs installation set.
-
-
16:30
→
17:00
WLCG Items 30m
-
<big> WLCG issues coming from ROC reports </big>
- ROC ???: Item
-
<big>WLCG Service Interventions (with dates / times where known) </big>Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
Many interventions scheduled this week. Please consult the URLs above for details.Time at WLCG T0 and T1 sites.
-
<big> WLCG Operational Review </big>Speaker: Harry Renshall / Jamie Shiers
-
<big> Alice report </big>
- Item
-
<big> Atlas report </big>
- Item
-
<big> CMS report </big>Speaker: Daniele Bonacorsi
-
<big> LHCb report </big>
- Item
-
<big> Storage services: Recommended base versions </big>The recommended baseline versions for the storage solutions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions
-
-
17:00
→
17:30
OSG Items 30mSpeaker: Rob Quick (OSG - Indiana University)
-
Discussion of open tickets for OSG
-
-
17:30
→
17:35
Review of action items 5m
-
17:35
→
17:36
AOB 1m
-
16:01
→
16:30