WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs:
  • VOs:
      • 16:00 16:00
        Feedback on last meeting's minutes
        Minutes
      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: Italy / SW Europe
          To: Central Europe / France


          Issues from SW Europe COD:
          1. No major issues to raise.
          Issues from Italian COD:
          1. No major issues to raise.
        • <big> PPS Report & Issues </big>
          PPS reports were not received from these ROCs:

          Issues from EGEE ROCs:
          1. None.
        • <big> EGEE issues coming from ROC reports </big>
          1. (ROC CE): It looks we have a central problem with accounting data. Listing of sites not publishing accounting data contains about 40 sites which suddenly stopped publishing in Dec 2007: http://www3.egee.cesga.es/acctenfor/nodata.php
            Some sites in CE reported problems with APEL similar to a bug: https://savannah.cern.ch/bugs/?32435
            Could APEL people comment on that?

          2. (ROC CE): When could we expect MON BOX on SL(C)4? For sites using SL4 this is one of SL3 dependencies.
        • <big> gLite Release News</big>
      • 16:30 17:00
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
          None.
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
          1. None.

          Time at WLCG T0 and T1 sites.

        • <big>FTS service review</big> 5m

          Please read the report linked to the agenda.
          In particular ?

          Speakers: Gavin McCance (CERN), Steve Traylen
        • <big> ATLAS service </big>
          See also https://twiki.cern.ch/twiki/bin/view/Atlas/TierZero20071 and https://twiki.cern.ch/twiki/bin/view/Atlas/ComputingOperations for more information.

          1. Storage Space:
            Each site should publish in the Information System updated information in the following fields:
            • GlueSAStateAvailableSpace
            • GlueSATotalOnlineSize
            • GlueSAUsedOnlineSize
            for:
            • each storage area with space tokens associated
            • each storage area associated with "default spaces" for a given storage class
            These informations are crucial for CCRC08
            Thanks in advance
          2. SE/SRM SAM critical tests for BNL Tier1 failing since mid December
            GridView Results.
          3. ATLAS would know the status and the time schedule for srmls on lxplus: right now it is deployed only for CERN PPS.

          Comments from operations team about the above three points.

          1. Publication of StorageSpace does not look to be released item: dCache and DPM.
          2. SAM Test Results. GGUS 31218. To me at least (SteveT) the ticket is wrong for problem. The problem is "No space for atlas".
          3. It is allready apparently there? /afs/cern.ch/project/gd/LCG-share/current/d-cache/srm/bin/srmls
        • <big>CMS service</big>
          • Item 1
          Speaker: Mr Daniele Bonacorsi (CNAF-INFN BOLOGNA, ITALY)
        • <big> LHCb service </big>
          1. rfio problems at CNAF (and now also at RAL). The problem (hanging connection in case the file on the SE is read from the WN using rfio protocol) is under investigation by CASTOR people with support of CNAF people. However being CNAF out of the production mask since months now (suffering the accounting) we are looking for the shortest way to get it fixed: accessing files through rootd rather than through rfiod. This has been proved to work at CERN (where it is happily used).

            I'd like to remind with this report this issue (that heavily penalizes computing mask of LHCb) and to set some actions that should be addressed consistently:
            1. CASTOR people + CNAF people to debug the rfio problem
            2. CNAF people (to install,configure and test rootd). They got the support from FIO and CASTOR people at CERN and it should foreseen for this week.
            3. In case the recipe works at CNAF involve RAL people for the point 2.
          Speaker: Dr roberto santinelli (CERN/IT/GD)
        • <big> ALICE service </big>
          • Item 1
          Speaker: Dr Patricia Mendez Lorenzo (CERN IT/GD)
        • <big> WLCG Service Coordination </big>
          • Item 1
          Speaker: Harry Renshall / Jamie Shiers
      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
      • 17:30 17:35
        Review of action items 5m
        list of actions
      • 17:35 17:35
        AOB
        1. Item 1