EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Maite Barroso Lopez (CERN)
Description
grid-operations-meeting@cern.ch
Weekly EGEE infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • EGEE operations team
  • EGEE ROC managers
  • site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    AND click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

      • 16:00 16:20
        EGEE Items 20m
        • <big>Central Grid-Operator-on-Duty (c-COD) handover</big>
          From Italy to France
          Handover Log:
          1) there are several tickets unsolved since a month regarding APEL problems: we should involve APEL experts (at least in the tickets in which they aren't investigating yet) to understand if there is a common cause or if each site has got a different problem.
          (RO-15-NIPNE) https://gus.fzk.de/ws/ticket_info.php?ticket=54784&from=ID
          (MA-01-CNRST) https://gus.fzk.de/ws/ticket_info.php?ticket=54115&from=ID
          (IN-DAE-VECC-02) https://gus.fzk.de/ws/ticket_info.php?ticket=54839&from=ID
          (RO-11-NIPNE) https://gus.fzk.de/ws/ticket_info.php?ticket=54771&from=ID
          (CA-SCINET-T2) https://gus.fzk.de/ws/ticket_info.php?ticket=54764&from=ID
          (VN-HPCC-HUT-HN) https://gus.fzk.de/ws/ticket_info.php?ticket=54731&from=ID
          (CA-ALBERTA-WESTGRID-T2) https://gus.fzk.de/ws/ticket_info.php?ticket=54707&from=ID
          (CERN-PROD) https://gus.fzk.de/ws/ticket_info.php?ticket=54424&from=ID

          2) last week appeared on dashobard very old alarms for "not production" and monitored node (like grid-ce2.physik.rwth-aachen.de still present): some sites are interested to keep monitored an host for test purposes and they don't want receive tickets in case of problems. Moreover the nodes registered on GOC-DB in this way are taken into account for site availability metrics (isn't it?). We should ask to SAM/NAGIOS developers to not trigger alarms for node marked as "not Production" and so to exclude them from availability calculations As a temporary solution, these types of node are put in downtime (anyway the relevant items in case of problems will continue to appear in the dashboard)

          3) a brief report on MPI test status:
          last Friday 14 CE in error, 4 in maintenance

          - general failures on kg-ce01.cc.kuleuven.be (BEgrid-KULeuven), creamce.reef.man.poznan.pl (PSNC), gilda-ce.rediris.es (RedIRIS_GILDA )and cedric.scai.fraunhofer.de (SCAI)
          - 1 is failing OPENMPI test (PSNC)
          - 6 are failing MPICH test (BEgrid-ULB-VUB, INFN-NAPOLI-PAMELA, prague_cesnet_lcg2, PSNC, SZTAKI, Taiwan-LCG2, UFRJ-IF)
          - 2 are failing MPICH2 test (BEgrid-ULB-VUB, PSNC, Taiwan-LCG2)
        • <big> Pilot Services Report & Issues </big>
          Info about active pilot services at:
          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPilots
        • <big> gLite Release News</big>
        • <big> EGEE issues coming from ROC reports </big>

          No major issues raised by any ROCs this week

        • <big> Apel status update </big>
          Speaker: Dr John Gordon (STFC-RAL)
      • 16:30 16:35
        Review of Action Items 5m
      • 16:35 16:40
        AOB 5m