WLCG-OSG-EGEE Operations meeting

28-R-15 (CERN conferencing service (joining details below))


CERN conferencing service (joining details below)

Nick Thackray
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    Click here for minutes of all meetings

    Click here for the List of Actions

      • 16:00 16:00
        Feedback on last meeting's minutes
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: France / AsiaPacific
          To: France / France

          Report from France COD:
          1. The node lcg-bdii.gsi.de has been removed from GOC DB since August 6th and is still tested by SAM.
            Answer: The node still exists in the GOC DB.

          Report from Asia Pacific COD:
          1. No issues this week.
        • <big> PPS Report & Issues </big>
          1. None this week.
        • <big> gLite Release News</big>
          Now in Production
          • No new updates since last week.

          Now in PPS
          • No new updates since last week.

          Soon in Production
          • gLite3.1 Update28 in preparation. This update has been delayed due to issues with the release process but will be released within the next days. The release contains:
            • glite-CONDOR_utils for lcg-CE(PATCH:1856)
            • New version of gsoap plugin with a vulnerability fix (affecting LB, WMS, UI, WN, VOBOX, CE)(PATCH:1846)
            • Several bug fixes on WMS and clients (PATCH:1780)
            • New Short Lived Credential Service (SLCS), allowing to get short-lived personal certificate based on Shibboleth AAI identity (PATCH:1693)
            • MyProxy? version 1.6.1-7 (fixes build issue related to globus flavour, already deployed in production) (PATCH:1978)
            • Various improvements on lcg-extra-jobmanagers (CE) (PATCH:1942)
            • GFAL and lcg_util update with new function gfal_removedir and Several bug fixes
            • FTS SL4 release (32 and 64 bit)
        • <big> EGEE issues coming from ROC reports </big>
          1. No items.
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          1. None.
        • <big> End points for FTM service at tier-1 sites </big>
          Here is the latest list of FTM end-points:

          The list of FTM end-points we have so far is:
          • ASGC: http://w-ftm01.grid.sinica.edu.tw/transfer-monitor-report/
          • BNL: ???
          • CERN: https://ftsmon.cern.ch/transfer-monitor-report/
          • FNAL: https://cmsfts3.fnal.gov:8443/transfer-monitor-report/
          • FZK: http://ftm-fzk.gridka.de/transfer-monitor-report/
          • IN2P3: http://cclcgftmli01.in2p3.fr/transfer-monitor-report/
          • INFN: https://tier1.cnaf.infn.it/ftmmonitor/
          • NDGF: ???
          • PIC: http://ftm.pic.es/transfer-monitor-report/
          • RAL: no endpoint in produciton yet
          • SARA/Nikhef: http://ftm.grid.sara.nl/transfer-monitor-report
          • TRIUMF: http://ftm.triumf.ca/transfer-monitor-report/
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
          1. RAL: CMS downtime for upgrade of Castor to 2.1.7. Tuesday, 12 August, from 07:30 UTC+1 to 15:30 UTC+1. Affected nodes:
            • lcgce03.gridpp.rl.ac.uk
            • lcgce04.gridpp.rl.ac.uk
            • srm-cms.gridpp.rl.ac.uk

          2. CERN: CASTOR 2.1.7-14 upgrade. Tuesday, 12 August, from 09:00 UTC+2 to 11:30 UTC+2. Affected nodes:
            • srm-cms.cern.ch

          3. RAL: ATLAS downtime for upgrade of Castor to 2.1.7. Wednesday, 13 August, from 07:30 UTC+1 to 15:30 UTC+1. Affected nodes:
            • lcgce03.gridpp.rl.ac.uk
            • srm-atlas.gridpp.rl.ac.uk
            • lcgce05.gridpp.rl.ac.uk

          4. CERN: CASTOR 2.1.7-14 upgrade. Wednesday, 13 August, from 09:00 UTC+2 to 11:30 UTC+2. Affected nodes:
            • srm-lhcb.cern.ch

          5. RAL: LHCb downtime for upgrade of Castor to 2.1.7. Thursday, 14 August, from 07:30 UTC+1 to 15:30 UTC+1. Affected nodes:
            • lcgce05.gridpp.rl.ac.uk
            • lcgce04.gridpp.rl.ac.uk
            • srm-lhcb.gridpp.rl.ac.uk

          6. CERN: CASTOR 2.1.7-14 upgrade. Thursday, 14 August, from 09:00 UTC+2 to 11:30 UTC+2. Affected nodes:
            • srm-alice.cern.ch

          7. MPPMU: Our Classic SE is planned to be removed from production the 28th August - grid-se.rzg.mpg.de. Please backup your data before that date.

          Time at WLCG T0 and T1 sites.

        • <big> WLCG Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
        • <big> Atlas report </big>
        • <big> CMS report </big>
          1. CMS has finished 3 of of 4 mid-week Global Run exercises, as planned. They are 1,5 days long exercises. From the computing standpoint, it has been a valuable set of exercise to check the full T0 workflow, also on (relatively) recently deployed components, from P5 down to transfers of the data to T1 sites, with custodiality also. We plan to run a CRUZET-4 cosmic run exercise in the time slot August, 18th-25th, most probably continuing after that with magnetic field on. From the computing standpoint, and from the support and shifting experience, we are trying to use these exercises to get prepared to a season of constant data flow.
          2. Problems found: some network at P5 issues, some DB-related interventions and issues, all properly documented and discussed with CERN-IT in several fora already. Progress can be done on identified areas, work is being done especially in the communication flows, and CRUZET-4 will be a chance to test for several days in a raw a quasi-real-life scenario for cosmic data flow and in general for some computing workflows.
          3. NOTE: agreement with Castor@CERN for upgrading the CMS instance of Castor@CERN to 2.1.7-14 version was found for tomorrow, Tuesday, August 12th, 09h00-11h30 CERN time.
            Speaker: Daniele Bonacorsi
          4. <big> LHCb report </big>
          5. <big> Storage services: Recommended base versions </big>
            The recommended baseline versions for the storage solutions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/GSSDCCRCBaseVersions
          6. <big> Storage services: this week's updates </big>
            • dCache announced version 1.8.0-16. It will most probably be available in one month. It contains several improvements:
              1. New Information Providers in accordance with the decisions taken by the "Dynamic Megatable" working group
              2. Improved version of Pin Manager. It allows to release pins per VO.
              3. Better performing srmLs
              4. New Pool System with no overcommitted space
              5. Improved srm clients with better handling of command line options
              The CCRC08 branch will still continue to be supported
            • New CASTOR information providers compliant with the decisions taken by the "Dynamic Megatable" working group in validation.
        • 17:00 17:30
          OSG Items 30m
          Speaker: Rob Quick (OSG - Indiana University)
          • Discussion of open tickets for OSG
        • 17:30 17:35
          Review of action items 5m
        • 17:35 17:35