lecture WLCG-OSG-EGEE Operations meeting
Date/Time: Monday, 13 October 2008 - 16:00 (Europe/Zurich)
Location: CERN conferencing service (joining details below) ( 28-R-15 )
Chairperson: Nick Thackray
Description: grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

    Material: Recording of the meeting unknown type file

     
     Monday, 13 October 2008
     16:00
    Feedback on last meeting's minutes    
     16:01
    EGEE Items (29')    
    • Grid-Operator-on-Duty handover
      From: UKI and Russia
      To: Taiwan and CE

      Report from Russia:
        Russian COD as a Backup team:
      • opened: 37
      • closed: 26
      • 2nd mail: 10
      • extended: 21
      • total: 94


      Report from UKI:
     
     
     
    • EGEE issues coming from ROC reports
      • UKI: No data available in ROC (or site) report(s) for the failures from SAM framework section.
     
    • gLite 3.1 update 33, BDII (10')
      Details on the changes of gLite 3.1 update 33 for the BDII
      Dear colleagues, the status of gLite 3.1 Update 33 is as follows:
      1. The glite-BDII (top-level BDII) meta-rpm for Update 33 was removed on Friday. At the same time the previous meta-rpm was changed to require exactly the previous version (3.9.1-5) of the bdii rpm. Sites that already upgraded their top-level BDIIs before these changes may want to downgrade (but see below). Resource and site BDIIs were not seen to display the instabilities described in Savannah bug #42727, therefore the meta-rpms for other node types have not been changed. The top-level BDII instability is being looked into with high priority.
      2. The "chown" problem reported by Michel Jouvin does not affect sites that use YAIM for their configurations. A fix for this problem has been coded and a new bdii version is being certified. It is expected to be released to the production system this week.
    Laurence Field (CERN)  
    • gLite 3.0 services to be obsoleted (5')
      • glite-SE_classic
      • glite-VOBOX
      • glite-WMS
      • glite-PX
      • glite-MON

      An announcement for this retirement is already on the gLite 3.0 page :
      http://glite.web.cern.ch/glite/packages/R3.0/
      This corresponds to the procedure (until we have new one) that was discussed in the ops meeting in Feb 08:
      https://twiki.cern.ch/twiki/bin/view/EGEE/WlcgOsgEgeeOpsMinutes2008x02x25#Support_for_gLite_3_0_services
      PLEASE, LET US KNOW ANY OBJECTION BY NEXT WEEK!
     
    • Proposed process for removing SA1 support for old gLite services document word file pdf file  
      Attaches is a proposed process for removing support from obsolete glite services and out-of-date versions of services.  Please read and comment as soon as possible.
     
     16:30
    WLCG Items (30')    
    • WLCG issues coming from ROC reports
      1. France: TEAM/ALARM tickets for T1s: how LHC expirements make their choice between these two type of tickets?
        ATLAS:
        -- ALARM tickets are for problems concerning T0 (mainly problem at T1 blocking data acceptance from T0)
        -- TEAM tickets for all other problems of importance (mainly T1<->T2 transfers for the moment) Currently in discussion: if the problem is not acknowledged by the site before 2PM the following day, then an ALARM ticket is sent.
        Could CMS, ALICE and LHCb explicit the range of use of each tickets?
     
    • status of the WMS for Alice (15')
      Alice wants to fully replace the RBs and only use the WMS in production at all sites. In Alice's computing model it is recommended (not mandatory) that sites provide a local WMS, though they understand that for some T2 sites this can be very difficult. Alice would like to requests to T1 sites and in general to all sites providing RBs to Alice, to migrate to the WMS. Specially the first target sites are NIKHEF and CCIN2P3.
      • NIKHEF : is providing 2 RBs but no WMS yet
      • IN2P3: no WMS there supporting Alice. In France there are only 2 at T2 sites: datagrid.cea.fr y lal.in2p3.fr. They would like to request IN2P3 to also provide one.
     
    • CREAM CE for Alice (& PPS pilot service)
      Alice would like to start using the CREAM CE in production. To do this, Alice has the following requirements on sites:
      • Keep current LCG CE and install CREAM CE on another box.
      • Install a 2nd VObox to point to the CREAM CE. VOBox can be in a virtual machine if the site is short of boxes.
      • Point the CREAM CE to the standard Alice production queue.
      • Need a GridFTP server somewhere on the site.
      This request also presents another opportunity: Any sites that wish to support Alice with the CREAM CE could also support the testing of the new ICE enabled WMS, simply by installing the latest version of the CREAM CE (available in the PPS repositories) rather then the version currently in the production repositories. Sites wishing to do this would also need to configure CMS as a VO on their site - no other action is needed on the part of the site.

      Any sites who are interested should contact occ-grid-support@cern.ch. Installation instructions for CREAM CE will be provided.

      Alice would like to ask that all LCG tier-1s (which support the Alice VO) contribute to this task. Alice would also like to invite as many tier-2 sites as possible to join in.
     
     
    Harry Renshall / Jamie Shiers  
    • Alice report
     
    • Atlas report
      1. the site is LPNHE (part of GRIF):
        it is in downtime
        https://goc.gridops.org/downtime/list?id=10455542
        but no rss feed has been sent about it.
        feed://cic.gridops.org/index_rssflow.php?service=downtime_vo&vo=atlas
        This could be useful for the CIC people to tune the rss feed, that is the way in which the experiments are retrieving the infos about the downtimes.
     
    • CMS report
      None.
    Daniele Bonacorsi  
    • LHCb report
      • Any comments from sites concerning last week request about gridmap file for LHCb? If not I will proceed by formulating an EGEE broadcast for all sites to implement this "safe" mapping in case of VOMS mapping failure.
      • EGEE downtime announcement procedure:
        1 Announcement of scheduled downtime with a mail "Announcement" at least 24h in advance as in the MoU.
        2. Start of downtime (scheduled and unscheduled) as of the time when it starts with a mail "Start" (with correct time!)
        3. End of downtime: mail"End" (with correct time)
      • (From Philippe) In the last couple of days we tend to receive update notifications from GGUS for tickets that according to the web page were not updated at all (ex #41707, last update was October 3rd but we got mails also recently). Why this happens?
     
     
     
     17:00
    OSG Items (30')   Rob Quick (OSG - Indiana University)  
    • Discussion of open tickets for OSG
     
     17:30
    Review of action items (5')    
     17:35
    AOB