WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: Asia Pacific, Russia, UK/I
  • VOs:
      • 16:00 16:00
        Feedback on last meeting's minutes
        Minutes
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: DECH / Russia
          To: South East Europe / Asia Pacific


          Report from DECH COD: Please note the following about these sites:
          1. YerPhI (ticket 26634): There is an open item in the operations meeting. The mentioned ticket has been closed over the week but needed to reopened since the SAM tests still frequently fail. The reason is always that no information about the SE found in the SE. It is not obvious if this is cased by a bad network connection or a badly performing inforprovider (at least to me). I propose to re-discuss the item at operations meeting.
          2. Australia-UNIMELB-LCG2 (ticket 34393): The SE of site seems to (almost) full since weeks. Beginning of the week it looked better and site was put to quarantine. This needed to be reverted since the situation has not changed. If there is again no reply to the ticket this week, I propose to escalate to the operation meeting next week.
          Report from Russian COD:
          1. Nothing to report.
        • <big> PPS Report & Issues </big>
          PPS reports were not received from these ROCs:
          AP IT RU SEE UKI


          Issues from EGEE ROCs:
          1. none reported
        • <big> gLite Release News</big>

          Release News:

          Now in production

          gLite 3.1.0 Update42 was released to production with HIGH priority.
          The update contains:
          • FTS
            • new version of FTA changing the gridFTP session handling (CCRC08)
          • Many services
            • lcg-vomscerts-4.9.0 adds next cert for lcg-voms


          Now in pre-production

          PPS site are now upgrading to gLite 3.1.0 PPS Updates 23 and 24:
          • WMS LB (SL4): first release to PPS
            • Patch for Bugs 31894, 32200, 29600 (security Hole), 32573 (WMS alias)
          • UI/WN/VOBOX
            • edg-gridftp-client-1.2.8 fixes bugs 33205, 27274
            • DPM/LFC v1.6.10
            • R3.1/SLC4/x86_64: DPM/LFC v1.6.10 (64bit)
            • R3.1/i386/SLC4: GFAL & lcg_util update with 5 bugfixes for CCRC08
          • DPM/LFC v1.6.10
            • DICOM back-end service for DPM
            • re-buildable source RPMs
            • support for MacOSX
            • group writable directories when SRM started with umask 0
            • bug fixes
          • CE
            • patch to Globus job manager to improve performances
          PPS sites are now upgrading to gLite 3.0.2 PPS Updates 47 and 48.
          • FTS
            • new version of FTA changing the gridFTP session handling (CCRC)
          • Many services
            • lcg-vomscerts-4.9.0 adds next cert for lcg-voms


          Soon in production

          gLite 3.1.0 PPS Updates 20 in preparation.
          The update, to be released tomorrow, will contain:
          • WMS LB (SL4): first release to PPS
            • Patch for Bugs 31894, 32200, 29600 (security Hole), 32573 (WMS alias)
          • UI/WN/VOBOX
            • DPM/LFC v1.6.10
            • R3.1/i386/SLC4: GFAL & lcg_util update with several bugfixes (some of them are requested for CCRC08)
          • DPM/LFC v1.6.10
            • DICOM back-end service for DPM
            • re-buildable source RPMs
            • support for MacOSX
            • group writable directories when SRM started with umask 0
            • bug fixes
          • CE
            • patch to Globus job manager to improve performances
        • <b>Next CIC portal release - IMPORTANT CHANGES
          Next release of the CIC portal is scheduled for Tuesday 22/04. The portal will be offline between 09:00 UTC and 09:30 UTC to allow a safe transition. *This release includes many changes in the global design of the portal* Menus have been reorganized in order to reduce their size, group functionalities and improve user-friendliness. We are aware that such drastic changes can be disturbing. We'll be happy to help and answer any question you may have on the new interface. Please address any comment to cic-information@in2p3.fr, or use the "contact us" section of the portal.
        • <big> EGEE issues coming from ROC reports </big>
          1. None in this week's ROC reports.
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          1. ROC action follow up on behalf of the VO. Some time ago ATLAS asked ROCs to follow-up with sites action on proper version of WNs at sites and 100GB disk space in SW area. Could we ask VOs for assigning a ROC with a ticket in such cases? This was used in past e.g.: https://gus.fzk.de/ws/ticket_info.php?ticket=28806 This reduces number of hops to reach the site, the VO can observe progress and sites/ROCs can ask questions to VO. In CE we would like to discuss some issues with the VO contact for this action. Points to discuss are: 1) Could the VO make critical the test which wants the ROCs to follow up? 2) Could the VO provide documentation for the tests as it is for OPS VO? For some sites the relevant SAM tests are failing and we don t know why. It is also not clear if 100GB is required free space or it is enough to have 100GB space in total for atlas VO.
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          1. Alias for IN2P3-CC local LFC will change on Thursday April 24th 09:00 UTC
            from: lfc-atlas.in2p3.fr
            to: lfc-prod.in2p3.fr

            Old alias lfc-atlas.in2p3.fr will:
            - disappear from information system on next thursday
            - still be available until end of may

            This change will be transparent.

          2. The Classic SEs at IN2P3-LPC are planned to be removed from production the 15th May:
            - clrauvergridse01.in2p3.fr
            - clrlcgse02.in2p3.fr
            Please backup your data before that date.

          3. ASGC's circuit provider will perform a maintenance on following two links.
            * TW(Taipei) - US(Chicago) - NL(Amsterdam) 2.5Gbps
            * TW(Taipei) - NL(Amsterdam) 10Gbps
            Start time: 2008-04-23 00:00 UTC
            End time: 2008-04-23 02:00 UTC
            Impact: ASGC will use alternative route path (via our peers) to T0/T1.

          4. The old Edinburgh site, ce.epcc.ed.ac.uk will be retired from use in two weeks time (1 May 2008). Storage services, via srm.epcc.ed.ac.uk, will be accessible via the new Edinburgh site, ce.glite.ecdf.ed.ac.uk for some time after this, although the intention is to slowly migrate to newer storage. This means that support for several VOs will be dropped by Edinburgh, as they are not part of UKI-SCOTGRID-ECDF's supported VO list. In particular, these vos are:
            alice, babar, biomed, cdf, cms, dzero, esr, fusion, geant4, hone, magic, minos, na48, planck, sixt, t2k and zeus
          5. At the start of May, the site egee.man.poznan.pl will be removed from production and shut down. Please backup your data stored on storage elements belonging to this site.


          Time at WLCG T0 and T1 sites.

        • <big> CCRC'08 Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
        • <big> Atlas report </big>
          The sites in the list below still haven't upgraded to the ATLAS requested version of lcg-utils (1.6.7 (SL4)):
          1. France ROC:
            • AUVERGRID
            • IN2P3-LPC
          2. SEE ROC:
            • GR-03-HEPNTUA
            • HG-04-CTI-CEID
            • WEIZMANN-LCG2
          3. Italian ROC:
            • INFN-FIRENZE
            • INFN-LNS
            • INFN-NAPOLI
            • INFN-NAPOLI-PAMELA
            • INFN-ROMA3
          4. NE ROC:
            • PDC
          5. CERN ROC:
            • TORONTO-LCG2
          6. UK/I ROC:
            • UKI-LT2-IC-LeSC
          All the requirements are here https://twiki.cern.ch/twiki/bin/view/LCG/GSSDCCRCBaseVersions
        • <big> CMS report </big>

          • News on Development:
          • Data certification, Processing at the T0:
          • Re-processing:
          • MC production:
          • Data Transfers and Integrity, DDT-2/LT status:
          • LINKs:
          Speaker: Daniele Bonacorsi
        • <big> LHCb report </big>
      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
      • 17:30 17:35
        Review of action items 5m
        list of actions
      • 17:35 17:35
        AOB
        1. Item 1