WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Steve Traylen (CERN)
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: Italy
  • VOs: No VO reports received
  • Click here for minutes of all meetings

    Click here for the List of Actions

      • 4:00 PM 4:00 PM
        Feedback on last meeting's minutes
      • 4:01 PM 4:30 PM
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: DE-CH/Russia
          To: UK/I South East Europe


          • Problems with operational tools:
            • GGUS could not send emails after Remedy server upgrade on 16.7.
            • Remaining synchronisation problems between CIC and GGUS on 17.7.
          • no ticket escalation to ops meeting from lead team
          • ticket metrics for last week
          • RO-02-NIPNE, ticket id#8348. The case transfered to political instances.
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
        • <big> gLite Release News</big>
        • <big> EGEE issues coming from ROC reports </big>
          1. Southwest
            We would like to raise the issue of a long-standing ticket that is still not resolved: GGUS #28620
        • Documents for Review 15m
          Reminder
          1. Comments on draft document about security command line tools, requested by Christoph Witzig (broadcast sent on 07 Jul, "Feedback request: EGEE/OSG joint document about security tools")
          2. comments on the multi platform support document edited by SA3 and TMB (broadcast sent on 09 Jul "Feedback request: TMB Proposal on gLite Multi Platform Support")
          Document links were added to last weeks minutes.
      • 4:30 PM 5:00 PM
        WLCG Items 30m
        • <big>Verification of alarm workflow for Tier-1 centres</big> 15m
          • We are doing the first service verification on Thursday,17th.
          • A brief update on the progress will today be given.
          Speaker: Guenter Grein (Unknown)
        • <big> WLCG issues coming from ROC reports </big>
          1. none
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

            .

          Time at WLCG T0 and T1 sites.

        • <big> WLCG Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • <big> Alice report </big>
        • <big> Atlas report </big>
        • <big> CMS report </big>

          • General:
            Status: CRUZET-3 is over, it was a good experience, and cosmic runs are more and more mature exercises in terms of CMS computing in the Tier-0 data handling sector. CERN services in CRUZET-3 mostly OK. Plans: extend and finalize tests, and prepare to next cosmic exercise, foreseen for the 2nd half of August.
          • Tier-0:
            Status: Work finalized on the P5->CERN transfer system, a repacker replay is now running (since July 17th), namely redoing the repack for CRUZET-3 data. Plans: Next monday CMS will start more replays with some T0 real prompt reco testing.
          • Tier-1 sites:
            Status: CRUZET-3 AlcaReco exercises at T1 sites were the first time CMS ran on *data* outside CERN/FNAL. Plans: CMS foresees transfers of the order of few TBs per T1 site from CERN, starting imminently. IN2P3 is custodial for CRUZET-3 (as CNAF was custodial for CRUZET-2).
          • Tier-2 sites:
            Plans: CMS would expect a centrally-trigger *big* transfer load of many CSA07 MC datasets to CMS T2's, as a needed step in order to complete the migration of the user analysis to T2 sites. Each T2 should expect to be asked to host a fraction of ~30 TB of those datasets. Good news is that among the needed ones, many datasets are _already now_ hosted by at least one Tier2, so the load may be less than what could in principle be expected. Work is being done to maximize the availability of datasets at T2's with the minimal amount of WAN traffic triggered. Subscriptions to T2's and transfers themselves will start soon, not unprobably it may happen this week already, so please T2 sites be prepared. Once finalized, the usage of T1 for analysis will start to be banned (i.e. once verified dataset can be accessed at 1++ T2 sites, the datasets will be masked in DBS at T1 sites, so CRAB jobs will hence not be able to reach T1 sites anymore).
          • More:
            - Running data consistency campaigns and monitoring campaign (since some weeks) - more T2/T3's are coming in and joining CMS, and soon the PhEDEx topology
          Speaker: Daniele Bonacorsi
        • <big> LHCb report </big>
        • <big>Recommended base versions for storage services:</big>
      • 5:00 PM 5:30 PM
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
      • 5:30 PM 5:35 PM
        Review of action items 5m
      • 5:35 PM 5:35 PM
        AOB