EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

antonio retico
Description
grid-operations-meeting@cern.ch
Weekly EGEE infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • EGEE operations team
  • EGEE ROC managers
  • site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    AND click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

      • 4:00 PM 4:20 PM
        EGEE Items 20m
        • <big>Central Grid-Operator-on-Duty (c-COD) handover</big>
          From Italy to France
          Handover Log:
          this week there where:
          • some expired tkts (ROD_CANADA, ROD_ICALG) that were solved
          • other with expiration date extended due to downtimes (ROD_NE)
          • other expired tkt from which there is no answer: ROC_CANADA - #54707 (APEL),
          • an expired tkt for SAMPA (ROC_LA) - that seems had lost the connection with the alarm, no action has been taken by ROD_LA
          • 2 tkt for CERN_PROD - both seem because of middleware probl: #53931 (CREAM-CE) has a "suggested fix" - but was not applied; #54424 - APEL problem, expired
        • <big> Pilot Services Report & Issues </big>
          Info about active pilot services at:
          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPilots
        • <big> gLite Release News</big>
        • <big> EGEE issues coming from ROC reports </big>

          From ROC DECH:
          LCG2-FZK Service Incident:
          Planned downtime affecting ATLAS: OUTAGE 2010-02-01 8:00 to 2010-02-05 15:00 (UTC) The dCache instance for atlas (atlassrm-fzk.gridka.de) will be migrated to Chimera.

          From ROC Russia:
          Wrong version detection command for the LB service. https://savannah.cern.ch/bugs/?61586 . This bug duplicate https://savannah.cern.ch/bugs/?55482 from 2009-09-09 09:59. So it is not corrected during 3(!) months.

        • fixing MPI sites (from the MPI WG) 15m
          Dear Maite (CC. Steven) It seems that already today some sites are starting to fix their MPI problems :) We also got a few reactions wondering about this sudden urge to fix MPI site problems now. It would certainly help if the the ROCs receive an explanatory e-mail about the MPI Task Force mission, containing also the link with to the official documentation to MPI Support in EGEE, meaning this one: https://twiki.cern.ch/twiki/bin/view/EGEE/MpiTools that each ROC should distribute it to their sites. Many people is concerned because they have followed some documentation which is also online coming from SEE Grid, and particular to certain cluster in Budapest. There are reasons today for being optimistic, because people are fixing the issues, and mpi-start continues to work without any problem in the CREAM CE (see http://indico.ifca.es/indico/getFile.py/access?contribId=10&sessionId=1& amp;resId=1&materialId=slides&confId=249 ) However, in the timelife of EGEE we can probably only fix the current sites, and arrange properly the documentation. Any other thing like new features of the middleware will have to waitt for future developments. See here for status of mpi-start: http://indico.ifca.es/indico/getFile.py/access?contribId=2&sessionId=0&a mp;resId=0&materialId=slides&confId=249 cheers, Isabel More information about the MPI knowledge DB: http://wiki.ifca.es/e-ciencia/index.php/MPI_Errors
        • Instances of out of date services in the grid 15m
          Attached you can find a list of instances of services that are “out-of-date” according to the “list of supported service versions” wiki page, here: https://twiki.cern.ch/twiki/bin/view/EGEE/SupportedServiceVersions
          more information
      • 4:30 PM 4:35 PM
        Review of Action Items 5m
      • 4:35 PM 4:40 PM
        AOB 5m