Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
        Feedback on last meeting's minutes
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: UK/I / Taiwan
          To: CERN / Italy

          No reports from this week's COD teams.
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

        • <big> gLite Release News</big>
          Latest version of LFC (3.1.12-0) contains a bug which can cause it to hang or crash. Workarounds provided in the Known Issues section of the LFC service (both Oracle and MySQL) in the gLite web portal (http://glite.web.cern.ch/glite/packages/R3.1/deployment/glite-LFC_mysql/glite-LFC_mysql-known-issues.asp and http://glite.web.cern.ch/glite/packages/R3.1/deployment/glite-LFC_oracle/glite-LFC_oracle-known-issues.asp).

          Release News:
          Please find gLite release news in:


        • <big> EGEE issues coming from ROC reports </big>
          1. [ROC France]: Any progress for action 000212 on Steve (i.e. Publishing Production Role restriction from CE queue)?

          2. [ROC DECH]: SAM had some problems this week ...

            SAM Problem: (network) problem with the CERN BDII used by the RB/WMS for job submission.

            SAM Problem: File missing for host certificate test.

          3. [ROC Italy]: Issue from INFN-T1:

            We noticed this problem on the GOC DB: in a open downtime, when status is changed, for example from Risk to Outage, the history is lost, so if we open one for "Risk status" and after 3 days we pass in "Outage status" for the GOC DB we have been always in "Outage Status". It seems one solution is to close the down of "Risk status" and open a new one for "Outage status".

          4. [ROC South Eastern Europe]: 99% of SEE WNs are now SL4 with gLite 3.1. We are also testing the SDJ configuration as it is described at http://egee-intranet.web.cern.ch/egee-intranet/NA1/TCG/wgs/SDJ-WG-TEC-v1.1.pdf

            Do other regions have some experience to share on this matter?

          5. [ROC UK/I]: SAM still reports sites as failing when there is a well identified grid wide failure. What is the timeline for no longer publishing these failures to sites (who when they are published spend time trying to figure out the problem).
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          1. none
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          Time at WLCG T0 and T1 sites.

        • <big> WLCG Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
        • <big> Atlas report </big>
        • <big> CMS report </big>
          Speaker: Daniele Bonacorsi
        • <big> LHCb report </big>
        • <big>Recommended base versions for storage services:</big>
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
        Review of action items 5m
