Nick Thackray
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
      • 16:00 16:00
        Feedback on last meeting's minutes
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: Asia Pacific and Central Europe
          To: SouthEast Europe and DECH

          Report from Asia Pacific::
          • Nothing to report.

          Report from CE::
          • Additional information about ticket set to 'Case transfered to political instances':
            Site name: UKI-LT2-QMUL
            ROC : UKI
            Ticket id: 8997 (GGUS id: 40945)
            Problem : RGMA-host-cert-valid
            'Case transfered to political instances' status from 2008-09-30 and no progress. Now node is in Scheduled Downtime.
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

        • <big> gLite Release News</big>
        • <big> EGEE issues coming from ROC reports </big>
          1. ROC CE: Some site admins are complaining that they cannot fill weekly reports - detail link is empty and it is impossible to check why failure appeared. Some of them even suggested that reports show failures that never happened.

          2. ROC France: For Information: IN2P3-CC T1 has now succeeded in configuring its CEs GIP to restrict CMS access to both VOMS:/cms/Role=production and VOMS:/cms/Role=lcgadmin. This configuration works only by using a Glite WMS, but CMS agreed as its production is entirely handled through glite WMS. Some CMS monitoring problem have still to be solved, but CMS production job submission has shown to be successful with this configuration.
            The way the configuration has been made (with help of Steve Traylen) can be found in GGUS ticket #37102
            That solution is close to Steve s proposal explained in the wiki page below: http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_queues_with_access_restricted_to_a_FQAN. But that needs some modifications. Steve, could you please update your page ?

          3. ROC France: Between 10/10 and 13/10, various SAM failures raised but that seems to be wrong alerts. Moreover, no details was provided with the SAM test details web page. See for ex.: https://lcg-sam.cern.ch:8443/sam/sam.py?funct=TestResult&nodename=lyogrid02.in2p3.fr&vo=OPS&testname=CE-sft-job&testtimestamp=1223606240 https://lcg-sam.cern.ch:8443/sam/sam.py?funct=TestResult&nodename=cclcgceli05.in2p3.fr&vo=OPS&testname=CE-sft-lcg-rm&testtimestamp=1223599915)

          4. ROC SWE: PIC comment: in the GridMap monitoring (http://gridmap.cern.ch) if one clicks the "show SI2k" button in the "topology view" section, the sites are scaled wrt the "total cpus" value in a SI2k units, which looks as computed just multiplying the number of job slots published times the GlueHostBenchmarkSI00. As most of the clusters are not homogeneous, this is not correct. GlueHostBenchmarkSI00 is just the value to which internal accounting is normalized.
            LIP comment: There are of failures shown on the ROC report for LIP-Lisbon CEs but none of them appeared at SAM. Is there a syncronization problem between ROC report and SAM DB (10/11/12 of October, ce02.pic.pt) ?
        • <big>gLite 3.0 services <b><i> NOW OBSOLETE </i></b> </big>
          • glite-SE_classic
          • glite-VOBOX
          • glite-WMS
          • glite-PX
          • glite-MON

          An announcement for this retirement is already on the gLite 3.0 page :
          This corresponds to the procedure (until we have new one) that was discussed in the ops meeting in Feb 08:
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          1. ROC Russia: There is a request from ATLAS to clean their files (at least) in Russian sites. The following procedure is proposed (DPM version):
            1. kill all files on the specified directories,
            2. clean database by using dpns-rm command.
            3. All external links are responsibilities of ATLAS VO.
            There are two question:
            1. Why should site managers ever do this, while VO administrators have enough access rights to do this themselves?
            2. Is it a procedure approved by WLCG project management?
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          Many interventions scheduled this week. Please consult the URLs above for details.

          Time at WLCG T0 and T1 sites.

        • <big> WLCG Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
          1. Item
        • <big> Atlas report </big>
          1. Item
        • <big> CMS report </big>
          1. Item
          Speaker: Daniele Bonacorsi
        • <big> LHCb report </big>
          1. Item
        • <big> Storage services: Recommended base versions </big>
          The recommended baseline versions for the storage solutions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/GSSDCCRCBaseVersions

        • <big> Storage services: this week's updates </big>
      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
      • 17:30 17:35
        Review of action items 5m
      • 17:35 17:35