WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nicholas Thackray (CERN)
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: Russia, SouthEast Europe
  • VOs:
  • list of actions
    Minutes
      • 16:00 16:00
        Feedback on last meeting's minutes
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: Central Europe / Italy
          To: Russia / CERN


          Issues:
        • <big> PPS Report & Issues </big>
          PPS reports were not received from these ROCs:
          AP, IT, NE,RU,SEE

          Issues from EGEE ROCs:
          1. No issues reported by the ROCs

          Release News:
          1. gLite3.1.0 PPS-Update09 was released to PPS, currently in phase of pre-deployment test the release contains the following fixes:
            • Fix for missing python libraries after restart of the UI (#1257)
            • Fixed dependancies of edg-mkgridmap (#1403)
            • Adjusted dependancies of R-GMA on WNs (#1423)
            • new certificate for voms.cern.ch for 3.1 release (#1452)
            • Updated lcg-info-provider-software (#1470)
            • Updated glite-yaim-bdii to publish site entry (#1471)
            • R3.1 updated a1_grid_env.sh script (#1500)
          2. gLite3.0.2 PPS-Update42 was released to PPS: Currently in pre-deployement (done for lcg-voms certs). The release contains:
            • new voms certificate for the WMS repository
            • upgraded Mysql server on LB
        • <big> Update on gLite service discovery </big>
          Working service discovery of the WMS ProxyServer and LB by the UI was lost with the release of WMS-3.1 (on SL3). This can be resolved by
          1. Publishing the org.glite.wms.wmproxy and org.glite.lb.server GlueServices.
          2. Adding some extra configuration to the default wms UI configuration files.

          Details: GGUS #28373.

          Bugs will be submitted for the additional relevant YAIM configuration.

        • <big> EGEE issues coming from ROC reports </big>
          1. (CERN): Is it possible to modify the target audience when using the new Downtime functionality of the GOC DB / CIC Portal?

          2. (Italy): On 8th of November, a DTEAM member submitted a job that caused our batch system (LSF) to create 129198776 pending jobs wanting to have 129198776 free slots. Of course the original job was killed.
            ---------
            From the late November 7 to morning of 8, INFN-T1 SAM tests for SRM had been failing at INFN-T1. The failures, and the consequent availability, have been biased by a flooding of requests for CASTOR from DTEAM (a factor 20 respect to usual).
            We (TIER-1) are configuring an additional set of sanity checks at the batch system level to help preventing this kind of situations.
            ---------
            Given the two mentioned DTEAM issues, as IT-ROC and TIER-1 we ask confirmation if the acceptable procedure in this cases can be:
            1) ban the user(s) (depending on severity of the problem)
            2) contact the user(s) and the VO
            3) contact the ROC and evaluate if it''s relevant to raise it at the weekly meeting

          3. (Italy): FTS client older on SL4/gLite3.1.
            On the UI gLite 3.1 (version 3.1.0-2 released on 24.10.07, based on SL4) the rpm glite-data-transfer-cli is quite old (glite-data-transfer-cli-3.3.0-2). An answer has been given on
            https://gus.fzk.de/pages/ticket_details.php?ticket=28749
            https://gus.fzk.de/ws/ticket_info.php?ticket=28749
            that 3.0 and 3.1 are following independent certification paths. Despite this, we just point out that on a newer release we considered normal to see newer software. In such cases, a notification on the release page would be appreciated.
          4. (NE - site SARA): Point raised last week GGUS ticket 18826 is assigned to SAM/SFT support team and confirmed on April 16, but nothing seems to happen, only from time to time the site is asked if the problem still exists, see: https://gus.fzk.de/pages/ticket_details.php?ticket=18826
            Update: Maarten Lithmaat has updated the ticket and is awaiting a response. Please check the ticket.

          5. (ROC ???): Item 5

          6. (ROC ???): Item 6

          7. (ROC ???): Item 7

          8. (ROC ???): Item 8

          9. (ROC ???): Item 9

          10. (ROC ???): Item 10

        • <big> gLite Release News</big>
        • <big> Removal of support for gLite 3.0 WN and UI </big>
          As the gLite 3.1 WN and UI have now been in production for more than one month, the support for the gLite 3.0 WN and UI will be reduced, from today, to security updates only. ALL support for the gLite 3.0 WN and UI will stop on Monday 10 December.
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
          1. [Announcement] FZK (Tier1 GridKa): Scheduled downtime for maintenance on November 6, 8:00-22:00 UTC (9:00-23:00 CET). Upgrade to dCache 1.8. - All VO's using the GridKa SE are affected. Data transfers are stopped during this period.

          Time at WLCG T0 and T1 sites.

        • <big>FTS service review</big> 5m

          Please read the report linked to the agenda.
          In particular ?

          Speakers: Gavin McCance (CERN), Steve Traylen
        • <big> ATLAS service </big>
        • <big>CMS service</big>
          • Item 1
          Speaker: Mr Daniele Bonacorsi (CNAF-INFN BOLOGNA, ITALY)
        • <big> LHCb service </big>
          • No large activity on the grid from LHCb these days. Reconstruction activity restarted.
          • Worth to report: we still have problem in the way we are mapped on the following sites: PIC,IN2P3. Using /lhcb/Role=lcgadmin we are not mapped to the *sgm* account (despite this is the default from YAIM and it should be everywhere in place). This prevents us to run SAM jobs and to install our new application software.
          Speaker: Dr roberto santinelli (CERN/IT/GD)
        • <big> ALICE service </big>
          • Item 1
          Speaker: Dr Patricia Mendez Lorenzo (CERN IT/GD)
        • <big> WLCG Service Coordination </big>
          • WLCG Service Reliability workshop, CERN, November 26 - 30 - agenda - wiki
          • Common Computing Readiness Challenge - CCRC'08 - Meetings schedule
          • CMS CSA07 has been extended till mid-November.
          • ATLAS M5 detector cosmics run has started to run till 5 November. Data for reconstruction and export not expected till later this week.
          Speaker: Harry Renshall / Jamie Shiers
      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
      • 17:30 17:35
        GGUS 6.0 Release of Nov. 29 2007 5m
      • 17:35 17:40
        Review of action items 5m
      • 17:40 17:40
        AOB
        1. Item 1