EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Maite Barroso Lopez (CERN)
Description
grid-operations-meeting@cern.ch
Weekly EGEE infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • EGEE operations team
  • EGEE ROC managers
  • site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    AND click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

      • 16:00 16:20
        EGEE Items 20m
        • <big>Central Grid-Operator-on-Duty (c-COD) handover</big>
          From Northern Europe to Italy
          Handover Log:
          Dear IT C-COD,
          Although it looks like the dashboard is full, the scenario is such:
          Many AP sites haven't updated their alarms (in OK status) or tickets (expired on 12th) since the weekend.
          Some of the sites are in downtime, and therefore those alarms/tickets are currently ignored.
          Lastly, and this is one for the WLCG meeting: The apel situation does not appear to have been finally resolved:
          "APEL Publication works normally and records are properly received, yet no update can currently be reflected in SAM or in the accounting portal.
          Note to operators: please ignore alarms on the APEL-Pub test until further notice."
          Therefore, I advise a moratorium on these tickets/alarms until APEL tells us all is OK again.
          Cheers,
          Vera Hansper NE ROC/NDGF C-COD
        • <big> Pilot Services Report & Issues </big>
          Info about active pilot services at:
          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPilots
        • <big> gLite Release News</big>
        • <big> EGEE issues coming from ROC reports </big>

          No major issues raised by any ROCs this week

          (5/14) ROCs hadn't submitted the report at 3:30

        • fixing MPI sites (from the MPI WG) 15m
          ** The SAM MPI tests are raising alarms from this morning, as agreed last week **

          Update received from Isabel Campos (MPI Task Force) last week here summarised.
          The current situation is the following:
          there are 90 sites which publish the MPI-START tag, 88 are tested by SAM and 2 other sites (IFCA and RAL) are not tested because the way they publish the SubCluster info.
          Of those sites: 69 working fine (67 at SAM + IFCA + RAL)
          20 errors
          2 maintenance
          which gives a 76% of sites passing the tests (75% if we don't count the sites out of SAM)
          For all the sites with errors a ticket in GGUS has been opened and most of them are working actively on finding a solution. There is a guide with the list of errors found and possible solutions for them at http://wiki.ifca.es/e-ciencia/index.php/MPI_Errors
          Documentation for MPI Support in EGEE:
          https://twiki.cern.ch/twiki/bin/view/EGEE/MpiTools
          More information about errors in the MPI knowledge DB:
          http://wiki.ifca.es/e-ciencia/index.php/MPI_Errors
        • Apel status update 15m
          Speaker: Dr John Gordon (STFC-RAL)
      • 16:30 16:35
        Review of Action Items 5m
      • 16:35 16:40
        AOB 5m