EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Maite Barroso Lopez (CERN)
Description
grid-operations-meeting@cern.ch
Weekly EGEE infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • EGEE operations team
  • EGEE ROC managers
  • site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    AND click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

      • 4:00 PM 4:20 PM
        EGEE Items 20m
        • <big>Central Grid-Operator-on-Duty (c-COD) handover</big>
          From Northern Europe to Italy
          Handover Log:
          ROC CERN let expire 3 tickets over the weekend, one of them has even already expired last Thursday.
          2 tickets older than 30 days: APEL problems in ROC AP with APEL support being involved in both. Both of them expire today.
          Otherwise it was a very quite week. No WLCG items I would say.
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:
          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
        • <big> gLite Release News</big>
          Please find gLite release news in:
          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases

          • Roll back of UPDATE 57 for gLite 3.1 due to critical problems in BDII: the problems were detected by SAM failures and GGUS tickets from 2 sites.
            Approximately 30 CEs were not being published as the BDII service on those machines were in a stopped state. In addition, one site reported that their top-level BDII was missing entries from a site that we publishing information correctly.
            The reason for the BDII failures on the CE was that the conditional restart of the service failed. This was due to a problem with the rpm that was already installed and the problem was only triggered on the update. Most of the sites affected had auto-update enabled and hence updated the package automatically.
            The problem with the top-level BDII was caused by the fact that the gLite 3.1 release was missing an rpm that was submitted for release on gLite 3.2 (patch 3154) but was never submitted for release on gLite 3.1. During certification on gLite 3.1 the problem was not detected as it only shows up in a certain scenario where a combination of certain versions are used together. This scenario was not captured. In PPS the problem was not detected as a clean deployment test for patch #3204 was not done in PPS. This is due to an unclean rejection of a previous patch in PPS which left some machines in a spurious mode.
            The release was rolled back while the problems are being fixed. The other components included in the release were not affected, but were also rolled-back, and will be made available again today in a new UPDATE.
            A detailed post-mortem is being done to take corrective actions so we avoid similar cases in the future.
        • <big> EGEE issues coming from ROC reports </big>
          AP, France, DECH and Russia hadn't validated their reports by 15:00.

          Italy:
          Which is the version for each Storage Element implementation to be compliant with the "Usage of Glue Schema v1.3 for WLCG Installed Capacity information"? As ROC, we could push and follow the upgrade of the old version and validate the published data.
          The Baseline versions of services and client tools for WLCG operations (https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions) seems to be update last 02-Jun-2009. This useful page should be update more frequently (at every gLite update?) just to be sure that the recommendations are not out of date.

        • <big>Grid Service Interventions </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
          Please consult the URLs above for details.

        • <big>Miscellaneous</big> 15m
          • Reminder for sites to move to WMS 3.2 (available in gLite repository). This must be done by the end of October!
          • The RB service was made obsolete quite some time ago and it should no longer be run in the production infrastructure. Please, decommission the RB and install an up-to-date WMS as soon as possible. Still 7 around: GILDA-INFN-CATANIA, HG-01-GRNET, IFCA-LCG2, IFIC-LCG2, JP-KEK-CRC-01, RRC-KI, TR-01-ULAKBIM
          more information
      • 4:30 PM 4:35 PM
        Review of Action Items 5m
      • 4:35 PM 4:40 PM
        AOB 5m