WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    OR click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

    Recording of the meeting
      • 16:00 16:01
        Feedback on last meeting's minutes 1m
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: ROC Italy and ROC France
          To: ROC Russia and ROC UK/I


          Report from ROC Italy :
          1. Case transferred to Operation Meeting: GGUS #40700 on BEIJING-CNIC-LCG2-IA64. APEL problem not solved yet, case opened on Sept. 10th.
          2. Case transferred to Operation Meeting: GGUS #42770 IN-DAE-VECC-02 trasferred to political instance, but after a site feedback returned to 2nd mail.

          Report from ROC France :
          1. We have observed that Follow-up of last escalation step by OCC and ROC was not correctly done.
            More details here : https://twiki.cern.ch/twiki/bin/view/EGEE/OperationalUseCasesAndStatus#9_Last_escalation_step_Site_susp

            We have 2 cases where the last step lasts more than one month:
            1. GGUS #40521: RU-Phys-SPbSU (1 month and a half)
              • 25/09/2008: last escalation step
              • 06/10/2008: raised at WLCG Ops meeting
              • 06/11/2008: still in last step and not suspended
              • 06/11/2008: Cyril L'Orphelin (COD-FR) send mail to Maite, Steve and Nick
              • 06/11/2008: Maite sent mail to Russian ROC
              • 06/11/2008: site suspended by Russian ROC
            2. GGUS #42015: ITPA-LCG2 (3 weeks)
              • 24/10/2008: last escalation step
              • 27/10/2008: raised at WLCG Ops meeting
              • 03/11/2008: raised again at WLCG Ops meeting
              • 07/11/2008: still in last step and not suspended
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
        • <big> gLite Release News</big>
        • <big> EGEE issues coming from ROC reports </big>
          • ROC France: There are still 2 tickets opened very urgent without answer:
            • Problem with LCG CE : GGUS #42981 .
            • Problem with WMS : GGUS #42999 . The WMS is not usable in production and it blocks the setup of ALICE WMS .


          • ROC SEE: We would like to point out that we identified a serious deployment problem for 64-bit WNs (missing x86_64 RPMs and mix-up of executables and libraries for 32-bit and 64-bit architectures): GGUS #43216 Due to this problem we had to take manual steps to resolve the issue and were failing SAM tests for several days, which will affect our availability. The error message was very misleading.

          • ROC SWE: The site BDII of the SWE site IFIC-LCG2 does not appear on the SAM test anymore ( GGUS #43353 ; Savannah #33616 ). Any update on this problem?
        • <big> RAL-LCG2 batch farm occupancy </big>
          The RAL-LCG2 batch farm has been running at 50% occupancy or less since June. For October, the nominal LHC VO's total fairshare of the farm was ~73%, but we only saw ~14% utilisation by the LHC VOs. Total occupancy for October was about 34%, with non-LHC VOs (mainly BaBar, biomed, phenogrid) contributing the rest. (Occupancy is measured as utilised KSI2K divided by total KSI2K capacity.) We would like to find out whether or not the experience of other T1s has been similar over the last few months, or if the lack of LHC work is specific to RAL-LCG2 and we should investigate further.
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          1. No items this week.
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Useful links:
          1. CIC Portal for broadcasts and news
          2. Scheduled downtimes (in the GOCDB)
          3. ATLAS site downtime calendar
          4. CERN IT Status Board


          Please consult the URLs above for details about this week's interventions.
          Some selected downtimes:
          1. CERN-PROD: Thursday 13 Nov. 10:00 - 12:00 UTC+1. VOMS. At risk. Transparent intervention on db behind VOMS.

          2. RAL: Tuesday 11 Nov. 09:00 - 12:00 UTC. Several CEs + SRMs. At risk. Castor CIP changes. Also preparation for setting up Castor DLF.

          3. CSC: Now until Thurs 13 Nov. 01:00 UTC+1. SITE OUTAGE. Update to Chimera.

          4. GRIF: Now until today, 16:00 UTC+1. SITE OUTAGE. Shutdown of water cooling system.

          Time at WLCG T0 and T1 sites.

        • <big> WLCG Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
          1. Encourage all Alice tier-1 sites to install a CREAM CE.
        • <big> Atlas report </big>
          1. Item
        • <big> CMS report </big>
          1. Item
          Speaker: Daniele Bonacorsi
        • <big> LHCb report </big>
          1. Item
        • <big> Storage services: Recommended base versions </big>
          The recommended baseline versions for the storage solutions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/GSSDCCRCBaseVersions

        • <big> Storage services: this week's updates </big>
      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
          Ticket for discussion: GGUS : 42490
      • 17:30 17:35
        Review of action items 5m
      • 17:35 17:36
        AOB 1m