lecture WLCG-OSG-EGEE Operations meeting
Date/Time: Monday, 7 July 2008 - 16:00 (Europe/Zurich)
Location: CERN conferencing service (joining details below) ( 28-R-15 )
Chairperson: Steve Traylen (CERN)
Description: grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: ???
  • VOs: No VO reports received

  •  
     Monday, 7 July 2008
     16:00
    Feedback on last meeting's minutes    
     16:01
    EGEE Items (29')    
    • Grid-Operator-on-Duty handover
      From: UK/I / Taiwan
      To: CERN / Italy


      No reports from this week's COD teams.
     
     
     
    • EGEE issues coming from ROC reports
      1. [ROC France]: Any progress for action 000212 on Steve (i.e. Publishing Production Role restriction from CE queue)?

      2. [ROC DECH]: SAM had some problems this week ...

        SAM Problem: (network) problem with the CERN BDII used by the RB/WMS for job submission.

        SAM Problem: File missing for host certificate test.

      3. [ROC Italy]: Issue from INFN-T1:

        We noticed this problem on the GOC DB: in a open downtime, when status is changed, for example from Risk to Outage, the history is lost, so if we open one for "Risk status" and after 3 days we pass in "Outage status" for the GOC DB we have been always in "Outage Status". It seems one solution is to close the down of "Risk status" and open a new one for "Outage status".

      4. [ROC South Eastern Europe]: 99% of SEE WNs are now SL4 with gLite 3.1. We are also testing the SDJ configuration as it is described at http://egee-intranet.web.cern.ch/egee-intranet/NA1/TCG/wgs/SDJ-WG-TEC-v1.1.pdf

        Do other regions have some experience to share on this matter?

      5. [ROC UK/I]: SAM still reports sites as failing when there is a well identified grid wide failure. What is the timeline for no longer publishing these failures to sites (who when they are published spend time trying to figure out the problem).
     
     16:30
    WLCG Items (30')    
    • WLCG issues coming from ROC reports
      1. none
     
     
    • WLCG Operational Review
    Harry Renshall / Jamie Shiers  
    • Alice report
     
    • Atlas report
     
    • CMS report
    Daniele Bonacorsi  
    • LHCb report
     
    • Recommended base versions for storage services:
     
     17:00
    OSG Items (30')   Rob Quick (OSG - Indiana University)  
    • Discussion of open tickets for OSG
     
     17:30
    Review of action items (5')    
     17:35
    AOB