WLCG-OSG-EGEE Operations meeting

Name: WLCG-OSG-EGEE Operations meeting
Start: 2008-06-16T16:00:00+02:00
End: 2008-06-16T18:00:00+02:00
Location: CERN conferencing service (joining details below)

Monday 16 Jun 2008, 16:00 → 18:00 Europe/Zurich

28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description

grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:

OSG operations team

EGEE operations team

EGEE ROC managers

WLCG coordination representatives

WLCG Tier-1 representatives

other site representatives (optional)

GGUS representatives

VO representatives

To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768

OR click HERE

NB: Reports were not received in advance of the meeting from:

ROCs: DECH, SEE

VOs:

- 16:00 → 16:01
  
  Feedback on last meeting's minutes 1m
- 16:01 → 16:30
  EGEE Items 29m
  - <big> Grid-Operator-on-Duty handover </big>
    
    From: SEE/ CERN
    To: DECH / Italy
    
    Report from CERN COD:
    
    The site ru-Chernogolovka-IPCP-LCG2 was reported to the Ops meeting last week for suspension, but the Russian ROC was not represented.
    Report from SEE COD:
    
    No sites to be considered for suspension from our shift.
  - <big> PPS Report & Issues </big>
    
    Please find Issues from EGEE ROCs and general info in:
    
    https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
  - <big> gLite Release News</big>
    
    Release News:
    Please find gLite release news in:
    
    https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases
  - <big> EGEE issues coming from ROC reports </big>
    
    France: Just a little comment, concerning Cern VOMS SD on monday. It might be interesting to schedule such a downtime another day than "monday". Because, if people wanted to get a valid VOMS proxy during this SD period, it would have to renew it on sunday. I heard that some people were not working on sunday ! ;)
    
    UKI (UKI-SOUTHGRID-BRIS-HEP): cerb-mds is in OUTAGE (until August 2008) according to GOC-DB: "Test StoRM server - should not be used for production yet". We don t want SAM tests running on it. But they are. I ve emailed sam-support@cern.ch twice to request "no tests please" or find out what is the procedure to not have SAM tests run on a test machine. No answer. Can anyone advise how to contact sam-support to get a response?? Thanks.
  - <big> Short deadline Jobs: status update and batch system configuration </big>
    
    A Short deadline job is:
    - A job with a deadline constraint, which provides some guarantees about its behavior; which is unable to proceed though prior explicit reservation. because they have a short execution time and because they are unexpected and urgent, they cannot be dealt only on a best effort basis in full production regime
    - A plain EGEE job in the following sense: it is submitted, scheduled and returned to the user though the standard mechanism governing the usage of the resources. In particular, it can be inspected by the usual tools (WMS trace) and is fully accounted for.
    
    For preliminary information:
    Status:
    - from bug #31278, the WMS is OK since February.
    - two sites have SDJ configuration files: LAL (sure) and CEA.
    
    Documentation:
    http://egee-intranet.web.cern.ch/egee-intranet/NA1/TCG/wgs/SDJ-WG-TEC-v1.1.pdf (section 5.2, the rest is not relevant)
    A full example file will be available shortly (the LAL one, used for more than one year).
    
    Speaker: Cecile Germain-Renaud (Unknown)
    
    Slides
- 16:30 → 17:00
  WLCG Items 30m
  - <big> WLCG issues coming from ROC reports </big>
    
    Italy: FTS configuration change at INFN-T1: Transfer agents for the LHC VOs has been changed so that zero transfer retries are performed.
  - <big>WLCG Service Interventions (with dates / times where known) </big>
    
    Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
    
    Time at WLCG T0 and T1 sites.
  - <big> CCRC'08 Operational Review </big>
    
    Speaker: Harry Renshall / Jamie Shiers
  - <big> Alice report </big>
  - <big> Atlas report </big>
  - <big> CMS report </big>
    
    Speaker: Daniele Bonacorsi
  - <big> LHCb report </big>
    
    RAL: we are not able to submit our pilots because our rank expression prevents to do so. This is because the number of locally waiting jobs from other VO is high enough to make extremely unattractive RAL CEs. We know that as soon as we will move to a consistent use of VOView (through gLite WMS) we will be able to steer anyway our jobs there because the rank is computed with VO specific information. The problem is that site admins there claim (at least on Friday)many job slots free and (paradox) an equivalent number of jobs waiting on the ocal LRMS.
    
    VOMS issue: after the intervention on the LCG production Oracle service we had problems in getting voms proxies for other 2 hours. VOMS server didn't recover automatically.
- 17:00 → 17:30
  OSG Items 30m
  
  Speaker: Rob Quick (OSG - Indiana University)
  - Discussion of open tickets for OSG
- 17:30 → 17:35
  
  Review of action items 5m
- 17:35 → 17:36
  
  AOB 1m