WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: AP, Italy
  • VOs: Alice, Atlas, CMS, LHCb
  • Recording of the meeting
      • 4:00 PM 4:00 PM
        Feedback on last meeting's minutes
        Minutes
      • 4:01 PM 4:30 PM
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: DECH/ UKI
          To: SWE/ Russia


          Issues: - 7th-8th GOCDB outage due to a power cut at RAL. No other problems.
        • <big> PPS Report & Issues </big>
          PPS reports were not received from these ROCs:
          AP, CE, IT, SEE, SWE

          Issues from EGEE ROCs:
          1. None reported

          Release News:
          1. gLite 3.0.2 PPS Update45 was released to pre-production last Tuesday.
            It is currently in phase of pre-deployment testing.
            The update contains:
            • YAIM module for 3.0 WMS to fix the bug of limit on uid for gridftp server
            All details in:
            https://twiki.cern.ch/twiki/bin/view/EGEE/PPSReleaseNotes_302_PPS_Update45
          2. gLite 3.1.0 PPS Update17 was released to pre-production last Thursday.
            It is currently being istalled at PPS sites after pre-deployment testing.
            The update contains:
            • glite-MPI_utils metapackage for gLite 3.1
            • Improved globus-gridftp startup script
            • various improvements for glite-info-provider-ldap
            • lcg_util v1.6.8 (SLC4)
            All details in:
            https://twiki.cern.ch/twiki/bin/view/EGEE/PPSReleaseNotes_310_PPS_Update17
        • <big> EGEE issues coming from ROC reports </big>
          1. (ROC France): This site had to change in emergency its domain name from "mrs.grid.cnrs.fr" to "in2p3.fr". A scheduled downtime is ongoing, but all old node names (and IPs) has already been replaced by the new ones into the GOC DB. During those operations, this site wondered whether this is possible or not to set an alias on a CE node. Is it possible ? Did any other site try this ?
          2. (ROC Russia): I would like to pay your attention at long and unsuccessful history of updates of lcg_util. The new one was issued for PPS. However, the update did not include the patch of lcg-rep and Classic SE (see bug #32999 in Savannah). However, this bug was fixed two week ago. Maarten Litmaath said that "release to production would be 1 or 2 weeks later" (see "Re: [LCG-ROLLOUT] RM SAM test on CE and Classic SE" in Fri, 8 Feb 2008). So, the sites which applied "update" as recommended operation procedure and still use Classic SE can not work properly during month or so. Is it really so complicated problem to rollback to situation before "updates"? Who can send a recommendation for site administrators to rollback manually at least? Btw, I think that the story like this may occur in future. I propose to think about rollback procedure on emergency. Manually or automatic.
          3. (ROC SWE): In the last days several sites in the SWE federations are experiencing problems with the Information System. Not clear wether this can be correlated with upgrading to the last version of the m/w or its yaim configuration. Want to raise this in the GridOps meeting to see if sites in other federations are seeing something similar.
        • <big> gLite Release News</big>
          1. gLite 3.1 Update13 released to production today.
            The update contains:
            • A Major upgrade to dcache (patch#1395)
            • An updtae from VDT to fix a gridftp issue
            • voms-admin client for UI and VOBOX
            • v dcacheVoms2Gplasma required for proxies created with grid-proxy-init
            All details in:
            http://glite.web.cern.ch/glite/packages/R3.1/updates.asp
        • <big>Phase out of classic SE</big> 5m
          Sites/VOs are requested to migrate in the next 3 months, before the end of May. A broadcast will be sent with the details. A migration to DPM is the suggested solution. https://twiki.cern.ch/twiki/bin/view/LCG/ClassicSeToDpm
      • 4:30 PM 5:00 PM
        WLCG Items 30m
        • CCRC'08 Operational Review 30m
          Weekly review of on-going CCRC'08 activities based on 3 agreed metrics:
          1. Experiments' scaling factors for functional blocks exercised in the challenge
          2. Experiments' critical services lists
          3. MoU targets
          Minutes of daily CCRC08 meetings
          Slides
      • 5:00 PM 5:30 PM
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
          Escalation Reports
      • 5:30 PM 5:35 PM
        Review of action items 5m
        list of actions
      • 5:35 PM 5:35 PM
        AOB
        1. Item 1