WLCG-OSG-EGEE Operations meeting

Name: WLCG-OSG-EGEE Operations meeting
Start: 2008-01-14T16:00:00+01:00
End: 2008-01-14T18:00:00+01:00
Location: CERN conferencing service (joining details below)

Monday 14 Jan 2008, 16:00 → 18:00 Europe/Zurich

28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description

grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:

OSG operations team

EGEE operations team

EGEE ROC managers

WLCG coordination representatives

WLCG Tier-1 representatives

other site representatives (optional)

GGUS representatives

VO representatives

To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768

OR click HERE

NB: Reports were not received in advance of the meeting from:

ROCs:

VOs:

- 16:00 → 16:01
  
  Feedback on last meeting's minutes 1m
  
  Minutes
- 16:01 → 16:30
  EGEE Items 29m
  - <big> Grid-Operator-on-Duty handover </big>
    
    From: Italy / SW Europe
    To: Central Europe / France
    
    Issues from SW Europe COD:
    No major issues to raise.
    Issues from Italian COD:
    No major issues to raise.
  - <big> PPS Report & Issues </big>
    
    PPS reports were not received from these ROCs:
    
    Issues from EGEE ROCs:
    
    None.
  - <big> EGEE issues coming from ROC reports </big>
    
    (ROC CE): It looks we have a central problem with accounting data. Listing of sites not publishing accounting data contains about 40 sites which suddenly stopped publishing in Dec 2007: http://www3.egee.cesga.es/acctenfor/nodata.php
    Some sites in CE reported problems with APEL similar to a bug: https://savannah.cern.ch/bugs/?32435
    Could APEL people comment on that?
    
    (ROC CE): When could we expect MON BOX on SL(C)4? For sites using SL4 this is one of SL3 dependencies.
  - <big> gLite Release News</big>
- 16:30 → 17:00
  WLCG Items 30m
  - <big> WLCG issues coming from ROC reports </big>
    
    None.
  - <big>WLCG Service Interventions (with dates / times where known) </big>
    
    Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
    
    None.
    
    Time at WLCG T0 and T1 sites.
  - <big>FTS service review</big> 5m
    
    Please read the report linked to the agenda.
    In particular ?
    
    Speakers: Gavin McCance (CERN), Steve Traylen
  - <big> ATLAS service </big>
    
    See also https://twiki.cern.ch/twiki/bin/view/Atlas/TierZero20071 and https://twiki.cern.ch/twiki/bin/view/Atlas/ComputingOperations for more information.
    
    Storage Space:
    Each site should publish in the Information System updated information in the following fields:
    
    GlueSAStateAvailableSpace
    GlueSATotalOnlineSize
    GlueSAUsedOnlineSize
    for:
    
    each storage area with space tokens associated
    each storage area associated with "default spaces" for a given storage class
    These informations are crucial for CCRC08
    Thanks in advance
    SE/SRM SAM critical tests for BNL Tier1 failing since mid December
    GridView Results.
    ATLAS would know the status and the time schedule for srmls on lxplus: right now it is deployed only for CERN PPS.
    
    Comments from operations team about the above three points.
    
    Publication of StorageSpace does not look to be released item: dCache and DPM.
    SAM Test Results. GGUS 31218. To me at least (SteveT) the ticket is wrong for problem. The problem is "No space for atlas".
    It is allready apparently there? /afs/cern.ch/project/gd/LCG-share/current/d-cache/srm/bin/srmls
  - <big>CMS service</big>
    
    Item 1
    
    Speaker: Mr Daniele Bonacorsi (CNAF-INFN BOLOGNA, ITALY)
  - <big> LHCb service </big>
    
    rfio problems at CNAF (and now also at RAL). The problem (hanging connection in case the file on the SE is read from the WN using rfio protocol) is under investigation by CASTOR people with support of CNAF people. However being CNAF out of the production mask since months now (suffering the accounting) we are looking for the shortest way to get it fixed: accessing files through rootd rather than through rfiod. This has been proved to work at CERN (where it is happily used).
    
    I'd like to remind with this report this issue (that heavily penalizes computing mask of LHCb) and to set some actions that should be addressed consistently:
    
    CASTOR people + CNAF people to debug the rfio problem
    CNAF people (to install,configure and test rootd). They got the support from FIO and CASTOR people at CERN and it should foreseen for this week.
    In case the recipe works at CNAF involve RAL people for the point 2.
    
    Speaker: Dr roberto santinelli (CERN/IT/GD)
  - <big> ALICE service </big>
    
    Item 1
    
    Speaker: Dr Patricia Mendez Lorenzo (CERN IT/GD)
  - <big> WLCG Service Coordination </big>
    
    Item 1
    
    Speaker: Harry Renshall / Jamie Shiers
- 17:00 → 17:30
  OSG Items 30m
  
  Speaker: Rob Quick (OSG - Indiana University)
  - Discussion of open tickets for OSG
    
    https://gus.fzk.de/pages/metrics/download_escalation_reports_roc.php
- 17:30 → 17:35
  
  Review of action items 5m
  
  list of actions
- 17:35 → 17:36
  AOB 1m
  1. Item 1