WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    OR click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: Russia and SEE
          To: CH-DE and CERN


          Report from SEE ?: Last week was relatively quiet, many of the alarms on our dashboard were due to sites in downtime. (The status of alarms over the weekend not aging has yet to be repaired.)
          • SITE NAME : IN-DAE-VECC-EUINDIAGRID, (NODE: vecce01.vecc.eu-india.res.in) ROC NAME : AP, GGUS TICKET NUMBER : GGUS:46058

            The test in error is the APEL test. However, they have not responded to ROD requests for updates or information for over 3 weeks.

          Report from Russia:
          1. - SITE NAME : INFN-FERRARA
            ROC NAME : Italy
            GGUS TICKET NUMBER : GGUS:45539
            - reason for escalation:
            Site is in SD since 2009-01-30.
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in: https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps

        • <big> gLite Release News</big>
        • <big> EGEE issues coming from ROC reports </big>
          • ROC CentralEurope:
            There is a problem with Gstat which reports more Free CPUs than Total CPUs GGUS:47098 after changing Physical CPU as requested in ticket GGUS:47040

            Comments from the chair. gstat staff are working at CERN at the moment, this will be fixed shortly I would hope. Thank you for reporting.

          • France
            IN2P3-CC (T1/T2): Unexpected electrical outage occured during the last scheduled downtime implied a temporary outage of most of core services hosted at IN2P3-CC: VOMS server for Biomed, Egeode, Auvergrid, etc., Central LFC for Biomed, Local LFC for Atlas, LHCb, and so on. 3 or 4 hours were necessary to restart the critical services. Sorry for the inconveniences.
        • <big>Grid Service Interventions </big>
          CERN Significant network disruption, 19th March 06:00 to 08:00 CET. Details

          In order to profit from this intervention it has been suggested by the UK that regions and sites record the problems seen during this time. Please report back next week and the results can be considered.

          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          Please consult the URLs above for details.

        • <big> New GridMap </big> 15m
          A new version of the GridMap monitoring tool has been released at http://gridmap.cern.ch/

          Here is the list of new/improved features:

          • The sites are sized by the number of logical CPUs (cores)
          • OSG sites participating in WLCG can be shown
          • Improved WLCG "tiers" view
          • Sites are shown if they are listed in the MoU, CEs are shown if they support WLCG
          • When SI2k (SpecInt2000) is selected, sites are sized by "WLCG Installed Capacity"
          • Sites in Maintenance are coloured grey (status is calculated as in GridView)
          • Bookmarking and the browser s "back" button work
          • User experience improvements and bugfixes
          • Debugging feature: Inspection of logical and physical cpu numbers published in the BDII
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          None
        • <big> Wiki page containing FTM Endpoints </big>
          Can all tier-1 sites please keep the list of FTM endpoints up to date. The list is here: https://twiki.cern.ch/twiki/bin/view/LCG/LCGFTMEndpoints
        • <big> WLCG Operational Review </big>
          The minutes of the daily WLCG Operations meetings (one file per week) are available here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOperationsMeetings
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice items </big>
        • <big> Atlas items </big>

        • <big> CMS items </big>
          1. Please have a look at the daily reports given at WLCG daily calls here.
        • <big> LHCb items </big>
        • <big> WLCG service recommended baseline versions </big>
          FTS Configuration
          The current FTS tries SRM v1 unless endpoints are published correctly.
          To force type 2 use

          FTA_GLOBAL_ACTIONS_SRMVERSION="2.2"

          Once glite-data-transfer-agents 3_3_4_1 is released this will be default anyway.
          Given all sites are using SRM v2.2 now we recommend that this configuration be added to all the current FTSes now prior to this upgrade reaching production.

          The recommended baseline versions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions

      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
          Information taken from the weekly escalation reports.

          GGUS #44104:Decided on 20090112 to ignore it for a number of weeks.
          GGUS #46646:We should examine offline why this contains a "Solution" in OSG and, for GGUS, it is neither 'Solved', nor 'Waiting for reply'.
          GGUS #46647:Duplicate of the above? If 'yes' is this human error?
          GGUS #46682: 'Solved' in OSG. Could GGUS please check how this could also appear 'Solved' in GGUS automatically?

      • 17:30 17:35
        Review of action items 5m
      • 17:35 17:35
        AOB