WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    Click here for minutes of all meetings

    Click here for the List of Actions

    Recording of the meeting
      • 16:00 16:01
        Feedback on last meeting's minutes 1m
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: UK/Ireland and CentralEurope
          To: Taiwan and France

          Report from UKI COD:
          • #8637 - couldn't get SAM results
          • #8907 - site removed from GOCDB, but SAM tests still available -> unsolvable
          • no other major issues

          Report from CE COD:
          • No issues for this week.
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
        • <big> gLite Release News</big>
          Please find gLite release news in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases

          Now in Production:
          Now in PPS:


          Soon in Production:
        • <big> EGEE issues coming from ROC reports </big>
          • French ROC report:
            1. Concerning the centralized distribution of gLite client software to EGEE sites, the site answers (5/15) were mainly for disapproval. A common concern is the troubleshooting that would become more complicated as a third party (SA3) is introduced by this mechanism. Other concerns are technical, as for example the overload of NFS/AFS, the difficulty to take into account site-specific configuration (Dcache, rfio, MPI, etc).
          • Germany Switzerland
            1. Request of a DECH site: Is there a timescale when LCG plans to integrate the latest VDT patches? The one of interest is the client upgrade for gridftp, as they solve a lot of issues.
              Answer: the present distribution includes VDT 1.6, with gridftp2 compatible clients. If there are important updates needed, we will have a look at them, and they will be back ported (VDT is in 1.10 now)
          • UKI
            1. GGUS#40608 submitted in respect with the Gridview problems experienced on Saturday 6 Sep and Sunday 7 Sep (unsched d/time was not taken into account)
        • Top BDII Publishing 15m
          A collection of Top BDIIs that are publishing is visible here. If you have a gLite 3.1 top level BDII then it should appear on this page. Please check.
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          Many interventions scheduled this week. Please consult the URLs above for details.

          Time at WLCG T0 and T1 sites.

        • <big> WLCG Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
        • <big> Atlas report </big>
        • <big> CMS report </big>

          • T0 workflows: In running mode. Highlights from the weekend. On Saturday: several small runs with special tests, no processing failures observed (just one run got stuck in the DAQ, still repacked to 100% though), then a couple of long runs (/BeamHalo and /Cosmics, all in apart Pixel, Tracker, ECAL endcap). Activated new offline DQM harvesting. ALCARECO migrated to global DBS, injected into PhEDEx and subscribed to CAF and CERN_MSS. Transfers ongoing with no major problems. Some blind regions in Lemon monitoring (reported by shifters) [*1]. --- On Sunday: some more long cosmic runs some stay in PromptReco for long time (but e.g. one took 4.4E6 cosmics evts..). --- This morning: a couple of hrs of slower data taking due to set-up problems in the trigger chain (DTTF), now OK.
          • T1 workflows: ASGC: Typhon in Taipei --- IN2P3: issue with the transfer of a custodial /Cosmics sample, seems to be related to PhEDEx Ops issues, a first diagnosis available and out soon, being tracked internally as [*2]. --- FZK: small tmp glitches in SE/CE SAM tests, may be related to the problems they had during the weekend with the power supply of the dCache system (file-open errors were triggered).
          • T2 workflows: some CMS JobRobot failures at some T2s, sites informed as appropriate. --- CMSSW installation problem at T2_UK_London_IC: being addressed.

            [*1]
            https://lemonweb.cern.ch/lemon-status/info.php?time=0.0.5&offset=0&entity=c2cms%252Ft1transfer&cluster=1&type=host
            https://lemonweb.cern.ch/lemon-status/info.php?time=0.0.5&offset=0&entity=c2cms%252Ft0export&cluster=1&type=host
            https://lemonweb.cern.ch/lemon-status/info.php?time=0.0.5&offset=0&entity=c2cms%252Ft0input&cluster=1&type=host
            http://lemonweb.cern.ch/lemon-status/info.php?time=0.0.5&offset=0&entity=c2cms%252Fcmscaf&cluster=1&type=host

            [*2]
            http://savannah.cern.ch/support/?105610

          Speaker: Daniele Bonacorsi
        • <big> LHCb report </big>
        • <big> Storage services: Recommended base versions </big>
          The recommended baseline versions for the storage solutions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/GSSDCCRCBaseVersions

        • <big> Storage services: this week's updates </big>
      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
          • https://gus.fzk.de/ws/ticket_info.php?ticket=37059
      • 17:30 17:35
        Review of action items 5m
      • 17:35 17:36
        AOB 1m