WLCG-OSG-EGEE Operations meeting

28-R-15 (CERN conferencing service (joining details below))


CERN conferencing service (joining details below)

Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: South West Europe
  • VOs: ATLAS; Alice; CMS
  • Recording of the meeting
      • 4:00 PM 4:00 PM
        Feedback on last meeting's minutes
      • 4:01 PM 4:30 PM
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: DECH / Italy
          To: CE / UK/I

          Report from DECH COD:
          1. One ticket escalated to the Ops meeting: GGUS #34400 (https://gus.fzk.de/ws/ticket_info.php?ticket=34400)
            Site fails because SAM uses old lcg-utils. ROC_CERN promised already to upgrade but obviously did not manage so far.
          Report from Italian COD:
          1. Suggested that the site IL-IUCC (ROC-SEE) be suspended.
            #37110 (SRM)
            #36185 (SE, same pb., closed)
            #36262 (CE)
            These tickets have been open for long time (early May) with no feedback. The site has been in downtime for a long time.
            Not escalated last Friday: we COD tried to give suggestions on Thursday, then we had big problems at CNAF on Friday.
            Checked today, no feedback.
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

        • <big> gLite Release News</big>

          Release News:
          Please find gLite release news in:

        • <big> EGEE issues coming from ROC reports </big>
          1. [SEE ROC]: Information: TAU-LCG2 is now closed, due to poor site administration and availability.

      • 4:30 PM 5:00 PM
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          1. None in the reports.
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          1. Due to network maintenance SARA's 3D database, saradb, will be unavailable on 30/06/2008 starting 16:00 UTC until 18:00 UTC.

          Time at WLCG T0 and T1 sites.

        • <big> Baseline versions of Storage Middleware </big>
          This is a list of the versions currently supported by the Grid Storage Systems Developers. We also outline the recommended version to have installed.

          • 2.1.7-10 will be released this week
            • Tier1s are recommended to upgrade faranno l'upgrade verso meta' Luglio
          • 2.1.8 will be released the first week of August
            • Tier0 will upgrade within the end of August
            • Tier1 will follow
          • Current recommended version is 1.3-27 on SLC3
          • Recommended version is 2.7-1 on SLC4 as soon as released.
          For Castor core support is granted for 2.1.n and 2.1.[n-1] where n is the version currently installed at Tier-1s. However, as soon as Tier-1s will move to 2.1.7, then 2.1.6 will not be supported any longer.
          For CASTOR SRM, 2.7-n and 1.3-27 will be supported till new announcement.

          Current version is 1.8.0-15p6 which fixes an essential bug with caching credential produced through grid-proxy-init. Patch release 7 is about to come out. It fixes a problem with checksum verification when copy a file in push mode between 2 dCache sites. 1.8.0-15p7 is the recommended version as soon as it is out (in the next days).

          Recommended and supported version is 1.3.20 on SLC4.

          Recommended and supported version is 1.6.10 on SLC4.
        • <big> WLCG Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
        • <big> Atlas report </big>
        • <big> CMS report </big>
          Speaker: Daniele Bonacorsi
        • <big> LHCb report </big>
          LHCb is running its DC06 activity at full regime. Stripping activity pointed out issues at RAL (still jobs cannot get there after the incident with the CMS rogue user killing the system with 10K jobs at once), IN2p3 (data access problem with gsidcap, Philippe sent all information through the GGUS ticket open for debugging their SRM/SE system), CNAF is in downtime, SARA: failures uploading output data.
          1. CIC broadcast information: received one for IN2p3, reason: queues drained (which is not very useful and informative) and one for GRIF: scheduled intervention, reason: unexpected hardware failure (but how can then be the intervention scheduled?).
      • 5:00 PM 5:30 PM
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
          GGUS ticket
      • 5:30 PM 5:35 PM
        Review of action items 5m
        list of actions
      • 5:35 PM 5:35 PM