WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

John Shade (CERN)
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    OR click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

    • Monday, 3 August
      • 1
        EGEE Items
        • a) <big>Central Grid-Operator-on-Duty (c-COD) handover</big>
          Form North East to Italy

          The case of IN-DAE-VECC-EUINDIAGRID as in GGUS:49754 has to be discussed today at the weekly meeting. SEE experienced severe problems with their local dashboard last week, but should again be back to normal. ROD CERN seems to be very sloppy concerning their alarm handling.

          Comment from CERN: I (SteveT) really could not work out how to raise a ticket on a problem, more generally a clash with a training course caused problems.

          Operational problems encountered: Handover logs https://cic.gridops.org/index.php?section=roc&page=dashboard&subpage=handover_new not sorted by date. This can be confusing.

        • b) <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:
          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
          1. Nothing this week?
          2. c) <big> gLite Release News</big>
            Please find gLite release news in:
            https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases

            1. Nothing yet.
          3. d) <big> EGEE issues coming from ROC reports </big>
            1. FZK:
              Downtime FZK:
              On Tuesday 11/08/2009, the latest Oracle patches will be applied to the FTS and LFC database back-end as well as to the ATLAS and LHCb 3D databases. Corresponding services are considered to be "at risk" during less than 4 hours each, the starting times will be communicated soon.
            2. Last week SouthEast asked about GGUS:50466 concerning the proxy-renew-daemon. This was raised at the EMT, the developer in question will be prompted but the item is now present on the EMTs list of tracked issues. Please prompt again if no update appears in two weeks.
          4. e) <big>Grid Service Interventions </big>
            Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
            Please consult the URLs above for details.

          5. f) Very old WN installations out there.
            The SAM CE tests include the replication of a file from the site's default SE for "ops" to a central SE, for which a DPM node at CERN is the default choice and thereby almost always used. This node still runs SLC3, which is no longer supported and must be upgraded. Testing on SLC4 with the latest DPM version for gLite 3.1 shows that if the switch were made today, the sites below would be affected.

            Therefore, this is a call to all sites that are failing the CE tests in the SAM Validation instance to upgrade their WNs to the latest version ASAP. We would like to migrate the SAM DPM production instance in ~3 weeks from now (by Monday 10th of August).

            So, please, all ROCs, follow this up with your sites - if need be, by opening GGUS tickets. Thanks in advance and best regards,

            Thank you to all sites that have clearly updated their WN versions. The list below is clearly a lot shorter than the 42 that was present.

            The current (August 3rd) list of affected sites is:

            • RU-Protvino-IHEP ce0001.m45.ihep.su 2.7.0
            • SPACI-CS-IA64 square.hpcc.unical.it 2.7.0
            • EENet kriit.eenet.ee 3.0.2
            • HK-HKU-CC-01 ce.grid.hku.hk 3.0.2
            • JP-KEK-CRC-01 dg10.cc.kek.jp 3.0.2
            • Taiwan-IPAS-LCG2 atlasce.phys.sinica.edu.tw 3.0.2
            • Taiwan-NCUCC-LCG2 ce.cc.ncu.edu.tw 3.0.2
            • TW-NTCU-HPC-01 host001.hpc.ntcu.edu.tw 3.0.2
            • UKI-LT2-RHUL ce1.pp.rhul.ac.uk 3.0.2
            • UNI-PERUGIA ce.grid.unipg.it 3.0.2
            A life feed of these results is available: XML format

            SAM production results: Production Results

            SAM Validation results: Validation Results

        • 2
          OSG Items
          Speakers: Maria Dimou, Rob Quick
        • 3
          Review of Action Items
        • 4
          AOB