WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: SEE
  • VOs: Atlas, Alice, BioMed, CMS, LHCb
  • list of actions
    Minutes
    Recording of the meeting
      • 16:00 16:00
        Feedback on last meeting's minutes
        Minutes
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: UKI / France
          To: CE / CERN


          Report from UKI COD:
          1. four 2nd mails sent to site admins
          2. four mails sent w/out escalation
          3. ticket 7554 - still no resolution there
          4. ticket 6062 might need political escalation
          Report from France COD:
          1. RO-08-UVT has new certificates for each services and a new domain address (from .info.uvt.ro to .grid.info.uvt.ro) . Don't open tickets for nodes in the old domain address.
            From site admin of RO-08-UVT : "I also have a new user certificate with a new DN (old certificate expired) and so I cannot access GOCDB to do the modifications. I still wait for GOCDB Administrators to change my DN (5 days ago I sent and email and yesterday I even opened a "DN change request" ticket through GOCDB but I got no answer by now) to have access to modify RO08UVT@GOCDB services information (new domain and new services DN)"
          2. A new ticket escalation for TAU-LCG2 => this site was supposed to be suspended by the ROC . We ask for suspension again .
          3. GR-04-FORTH-ICS : Last step of escalation has been reached. Last entry from site said that problem was solved but I see only failures since 10 days .
        • <big> PPS Report & Issues </big>
          PPS reports were not received from these ROCs:
          AP, CERN, CE, IT, NE, SEE


          Issues from EGEE ROCs:
          1. None reported

          Release News:
          1. Last update to PPS on (Glite 3.1.0 PPS Update 21) was released to PPS on Friday 17th.
            No other updates received since then. Release notes in:
            https://twiki.cern.ch/twiki/bin/view/EGEE/PPSReleaseNotes_310_PPS_Update21
        • <big> EGEE issues coming from ROC reports </big>
          1. (ROC France): From 21 to 25-03 : SAM problem ("BrokerHelper: no compatible resources" with wms112.cern.ch) This leads all sites to fail the tests.

          2. (ROC UKI): As noted in the GLASGOW site comments, we are seeing a lot of short-lived errors in SAM recently. What is causing this and will the tests improve again?
            Report from UKI-NORTHGRID-LANCS-HEP site:
            We had a number of failures due to the SAM test user proxy expiring after the Easter Weekend. Is the expiry time on the proxys long enough to take account of long holiday weekends?

          3. (ROC UKI): The SL4 upgrades as several UKI sites have not gone smoothly. GGUS tickets are being raised where problems are being found, but what is the experience in other ROCs?

        • <big> gLite Release News</big>
          1. An update on the gLite 3.1 baseline will be released this week.
            The content of the release is not defined yet, although it will for sure contain the MONBOX 3.1
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          1. None this week.
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          1. Reminder: FZK-LCG2: Complete downtime 1.4. (5:00-21:00 UTC) for hardware maintenance and basic OS and firmware updates.


          Time at WLCG T0 and T1 sites.

        • <big> CCRC'08 Operational Review </big>
          • Item 1
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
          No report received before the meeting.
        • <big> Atlas report </big>
          1. ATLAS sites with lcg-utils for SRM2:
            we have developped a SAM test to see which version of lcg-utils has been installed on the WN of the ATLAS supporting sites.
            The results can be seen in the sam web page, selecting ATLAS VO, CE, CE-sft-lcg-version
            SAM link
            The sites that give ERROR in this test didn't upgrade to the SRM2 compatible version of lcg-utils.
            Hope this could help in following the action of having, in all the ATLAS supporting sites, the WN upgraded to SRM2
        • <big> CMS report </big>
          No report received before the meeting.
          Speaker: Daniele Bonacorsi
        • <big> LHCb report </big>
          No report received before the meeting.
      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
          The only outstanding ticket is: https://gus.fzk.de/ws/ticket_info.php?ticket=31037
      • 17:30 17:35
        Review of action items 5m
        list of actions
      • 17:35 17:35
        AOB
        1. Item 1