WLCG-OSG-EGEE Operations meeting

Name: WLCG-OSG-EGEE Operations meeting
Start: 2008-10-20T16:00:00+02:00
End: 2008-10-20T18:00:00+02:00
Location: CERN conferencing service (joining details below)

Monday 20 Oct 2008, 16:00 → 18:00 Europe/Zurich

28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray

Description

grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:

OSG operations team

EGEE operations team

EGEE ROC managers

WLCG coordination representatives

WLCG Tier-1 representatives

other site representatives (optional)

GGUS representatives

VO representatives

To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0148141

OR click HERE
(Please specify your name & affiliation in the web-interface)

Click here for minutes of all meetings

Click here for the List of Actions

- 1
  
  Feedback on last meeting's minutes
- 2
  EGEE Items
  - a) <big> Grid-Operator-on-Duty handover </big>
    
    From: Asia Pacific and Central Europe
    To: SouthEast Europe and DECH
    
    Report from Asia Pacific::
    
    Nothing to report.
    
    Report from CE::
    
    Additional information about ticket set to 'Case transfered to political instances':
    Site name: UKI-LT2-QMUL
    ROC : UKI
    Ticket id: 8997 (GGUS id: 40945)
    Problem : RGMA-host-cert-valid
    'Case transfered to political instances' status from 2008-09-30 and no progress. Now node is in Scheduled Downtime.
  - b) <big> PPS Report & Issues </big>
    
    Please find Issues from EGEE ROCs and general info in:
    
    https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
  - c) <big> gLite Release News</big>
    
    Please find gLite release news in:
    
    https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases
  - d) <big> EGEE issues coming from ROC reports </big>
    
    ROC CE: Some site admins are complaining that they cannot fill weekly reports - detail link is empty and it is impossible to check why failure appeared. Some of them even suggested that reports show failures that never happened.
    
    ROC France: For Information: IN2P3-CC T1 has now succeeded in configuring its CEs GIP to restrict CMS access to both VOMS:/cms/Role=production and VOMS:/cms/Role=lcgadmin. This configuration works only by using a Glite WMS, but CMS agreed as its production is entirely handled through glite WMS. Some CMS monitoring problem have still to be solved, but CMS production job submission has shown to be successful with this configuration.
    The way the configuration has been made (with help of Steve Traylen) can be found in GGUS ticket #37102
    That solution is close to Steve s proposal explained in the wiki page below: http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_queues_with_access_restricted_to_a_FQAN. But that needs some modifications. Steve, could you please update your page ?
    
    ROC France: Between 10/10 and 13/10, various SAM failures raised but that seems to be wrong alerts. Moreover, no details was provided with the SAM test details web page. See for ex.: https://lcg-sam.cern.ch:8443/sam/sam.py?funct=TestResult&nodename=lyogrid02.in2p3.fr&vo=OPS&testname=CE-sft-job&testtimestamp=1223606240 https://lcg-sam.cern.ch:8443/sam/sam.py?funct=TestResult&nodename=cclcgceli05.in2p3.fr&vo=OPS&testname=CE-sft-lcg-rm&testtimestamp=1223599915)
    
    ROC SWE: PIC comment: in the GridMap monitoring (http://gridmap.cern.ch) if one clicks the "show SI2k" button in the "topology view" section, the sites are scaled wrt the "total cpus" value in a SI2k units, which looks as computed just multiplying the number of job slots published times the GlueHostBenchmarkSI00. As most of the clusters are not homogeneous, this is not correct. GlueHostBenchmarkSI00 is just the value to which internal accounting is normalized.
    LIP comment: There are of failures shown on the ROC report for LIP-Lisbon CEs but none of them appeared at SAM. Is there a syncronization problem between ROC report and SAM DB (10/11/12 of October, ce02.pic.pt) ?
  - e) <big>gLite 3.0 services <b><i> NOW OBSOLETE </i></b> </big>
    
    glite-SE_classic
    glite-VOBOX
    glite-WMS
    glite-PX
    glite-MON
    
    An announcement for this retirement is already on the gLite 3.0 page :
    http://glite.web.cern.ch/glite/packages/R3.0/
    This corresponds to the procedure (until we have new one) that was discussed in the ops meeting in Feb 08:
    https://twiki.cern.ch/twiki/bin/view/EGEE/WlcgOsgEgeeOpsMinutes2008x02x25#Support_for_gLite_3_0_services
- 3
  WLCG Items
  - a) <big> WLCG issues coming from ROC reports </big>
    
    ROC Russia: There is a request from ATLAS to clean their files (at least) in Russian sites. The following procedure is proposed (DPM version):
    
    kill all files on the specified directories,
    clean database by using dpns-rm command.
    All external links are responsibilities of ATLAS VO.
    There are two question:
    
    Why should site managers ever do this, while VO administrators have enough access rights to do this themselves?
    Is it a procedure approved by WLCG project management?
  - b) <big>WLCG Service Interventions (with dates / times where known) </big>
    
    Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
    
    Many interventions scheduled this week. Please consult the URLs above for details.
    
    Time at WLCG T0 and T1 sites.
  - c) <big> WLCG Operational Review </big>
    
    https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek081013
    
    Speaker: Harry Renshall / Jamie Shiers
  - d) <big> Alice report </big>
    
    Item
  - e) <big> Atlas report </big>
    
    Item
  - f) <big> CMS report </big>
    
    Item
    
    Speaker: Daniele Bonacorsi
  - g) <big> LHCb report </big>
    
    Item
  - h) <big> Storage services: Recommended base versions </big>
    
    The recommended baseline versions for the storage solutions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/GSSDCCRCBaseVersions
  - i) <big> Storage services: this week's updates </big>
    
    Refer to the wiki page here: https://twiki.cern.ch/twiki/bin/view/LCG/CCRC08StorageStatus
- 4
  OSG Items
  
  Speaker: Rob Quick (OSG - Indiana University)
  - a) Discussion of open tickets for OSG
- 5
  
  Review of action items
- 6
  
  AOB