WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: DECH, SEE
  • VOs:
      • 1
        Feedback on last meeting's minutes
      • 2
        EGEE Items
        • a) <big> Grid-Operator-on-Duty handover </big>
          From: SEE/ CERN
          To: DECH / Italy


          Report from CERN COD:
          1. The site ru-Chernogolovka-IPCP-LCG2 was reported to the Ops meeting last week for suspension, but the Russian ROC was not represented.
          Report from SEE COD:
          1. No sites to be considered for suspension from our shift.
        • b) <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
        • c) <big> gLite Release News</big>

          Release News:
          Please find gLite release news in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases
        • d) <big> EGEE issues coming from ROC reports </big>
          1. France: Just a little comment, concerning Cern VOMS SD on monday. It might be interesting to schedule such a downtime another day than "monday". Because, if people wanted to get a valid VOMS proxy during this SD period, it would have to renew it on sunday. I heard that some people were not working on sunday ! ;)

          2. UKI (UKI-SOUTHGRID-BRIS-HEP): cerb-mds is in OUTAGE (until August 2008) according to GOC-DB: "Test StoRM server - should not be used for production yet". We don t want SAM tests running on it. But they are. I ve emailed sam-support@cern.ch twice to request "no tests please" or find out what is the procedure to not have SAM tests run on a test machine. No answer. Can anyone advise how to contact sam-support to get a response?? Thanks.
        • e) <big> Short deadline Jobs: status update and batch system configuration </big>
          A Short deadline job is:
          - A job with a deadline constraint, which provides some guarantees about its behavior; which is unable to proceed though prior explicit reservation. because they have a short execution time and because they are unexpected and urgent, they cannot be dealt only on a best effort basis in full production regime
          - A plain EGEE job in the following sense: it is submitted, scheduled and returned to the user though the standard mechanism governing the usage of the resources. In particular, it can be inspected by the usual tools (WMS trace) and is fully accounted for.

          For preliminary information:
          Status:
          - from bug #31278, the WMS is OK since February.
          - two sites have SDJ configuration files: LAL (sure) and CEA.

          Documentation:
          http://egee-intranet.web.cern.ch/egee-intranet/NA1/TCG/wgs/SDJ-WG-TEC-v1.1.pdf (section 5.2, the rest is not relevant)
          A full example file will be available shortly (the LAL one, used for more than one year).
          Speaker: Cecile Germain-Renaud (Unknown)
          Slides
      • 3
        WLCG Items
        • a) <big> WLCG issues coming from ROC reports </big>
          1. Italy: FTS configuration change at INFN-T1: Transfer agents for the LHC VOs has been changed so that zero transfer retries are performed.
        • b) <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          Time at WLCG T0 and T1 sites.

        • c) <big> CCRC'08 Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • d) <big> Alice report </big>
        • e) <big> Atlas report </big>
        • f) <big> CMS report </big>
          Speaker: Daniele Bonacorsi
        • g) <big> LHCb report </big>
          RAL: we are not able to submit our pilots because our rank expression prevents to do so. This is because the number of locally waiting jobs from other VO is high enough to make extremely unattractive RAL CEs. We know that as soon as we will move to a consistent use of VOView (through gLite WMS) we will be able to steer anyway our jobs there because the rank is computed with VO specific information. The problem is that site admins there claim (at least on Friday)many job slots free and (paradox) an equivalent number of jobs waiting on the ocal LRMS.

          VOMS issue: after the intervention on the LCG production Oracle service we had problems in getting voms proxies for other 2 hours. VOMS server didn't recover automatically.
      • 4
        OSG Items
        Speaker: Rob Quick (OSG - Indiana University)
        • a) Discussion of open tickets for OSG
      • 5
        Review of action items
      • 6
        AOB