WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    OR click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

    Recording of the meeting
      • 1
        EGEE Items
        • a) <big> Grid-Operator-on-Duty handover </big>
          From: DECH + SouthEast Europe
          To: CERN + France


          Report from SouthEast Europe:
          • List of unresponsive sites:
            • None
          • Problems Encountered during shift:
            • Again new alarms for nodes which have already been in SD
            • The new version of the https://lcg-sam.cern.ch:8443/sam/sam.py?... looks more attractive but unfortunately it is not so clear and easy to deal with those cases when an alarm is in ERROR but the last SAM test show that the corresponding service is still OK.
          Report from DECH Europe :
          • List of unresponsive sites:
            • None
          • Problems Encountered during shift:
            • GGUS ticket: 46448
              Site USCMS-FNAL-WC1 is an OSG site. Alarms should not be raised. But it happened this week when they started to publish their resources in a resource group. Seems to be fixed now.
            • GGUS ticket: 46448
              The alarm FTS-infosites on fts-t1import.cern.ch is failing due to the fact that the middleware does not foresee the current production scenario in use at CERN. Developers are aware, a bug has been opened and it will be like this until the bug is fixed (Savannah bug #46083)
        • b) <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in: https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps

          • Pilot service of WMS3.1: in progress
            • new instances of WMS based on gLite 3.1 PPS Update43 set up at CNAF and SCAI and CERN-PROD
            • PATCH:2802 was deployed introducing a fix to the WMS info provider
            • Details about the pilot (planning, layout, technical info) can be found in the page https://twiki.cern.ch/twiki/bin/view/LCG/PpsPilotWMS31
            • Details about the single tasks can be found in the tracker http://www.cern.ch/pps/index.php?dir=./ActivityManagement/SA1DeploymentTaskTracking specifically listing the subtasks of TASK:9038

        • c) <big> gLite Release News</big>
          Please find gLite release news in: https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases

          • gLite 3.1 PPS Update 44 went through deployment test and it is now being installed by the remaining PPS sites. The update contains:
            • New version of Cream CE. (PATCH:2667 ,PATCH:2669). Among others this version provides:
              1. Short term proxy renewal solution in CREAM based CE
              2. fixes in particular BUG:44712 (Problem with lcmaps conf file used for glexec) currently affecting Alice
            • [ YAIM ] glite-yaim-core 4.0.6 with many bug fixes (PATCH:2636)(PATCH:2697)
            • [BDII] Default DB cache size reduced to 50Mb(PATCH:2679) for x86_64
            • [WN] New glite-wn-info command designed to be executed on the WN by a job submitter. It returns information about that worker node to be used in a grid context (PATCH:2757 ; PATCH:2758)
          • Release of gLite 3.1 Update 41 to production in preparation The update, scheduled for the 25th of February will contain:
            • update to WMS 3.1 with numerous bug fixes
            • New version of Cream CE. (PATCH:2667 ,PATCH:2669). Among others this version provides:
              1. Short term proxy renewal solution in CREAM based CE
              2. fixes in particular BUG:44712 (Problem with lcmaps conf file used for glexec) currently affecting Alice
        • d) <big> EGEE issues coming from ROC reports </big>
          • SWE ROC:
            • We would like to know the status of the new gLite "authorization framework", the "framework to identify local T2 users" at a site.
          • UKI ROC:
            • Got a GGUS ticket (46475) but believe tickets should not apply. Still waiting for feedback on this.
              This CE is flagged as "Not in Production" in the GOCDB. Monitoring is turned on for troubleshooting purposes during commissioning. Our understanding is that GGUS ticketing does not apply in these circumstances.
        • e) <big>Grid Service Interventions </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          Please consult the URLs above for details.

          In particular, the following sites requested that these downtimes be reported here:

        • f) <big> Update on SL5 </big>
          Speaker: Oliver Keeble (CERN)
      • 2
        WLCG Items
        • a) <big> WLCG issues coming from ROC reports </big>
          1. DECH: FZK-LCG2: New instance for FTS (2.1) is in production. The two instances will run in parallel for some time until all experiments have switched to the new instance.
            The new Service name is fts-fzk.gridka.de
          2. <be> DECH: CMS User with voms group /cms/dcms cannot run with at CERN and various other sites, see https://gus.fzk.de/ws/ticket_info.php?ticket=46019.
            Not supporting this group and probably a lot of other groups makes no sense or the groups are waste. In my opinion, when one site supports a VO it should use wildcards to ensure the support for all users proxies. If it does not use wildcards the queues in the information system should be published only for the supported groups and roles.
            Is there a standard way how to deal with this situation? Or is it possible to exclude special group or roles in the information system (blacklist)?
        • b) <big> Wiki page containing FTM Endpoints </big>
          Can all tier-1 sites please keep the list of FTM endpoints up to date. The list is here: https://twiki.cern.ch/twiki/bin/view/LCG/LCGFTMEndpoints

          Note: This requirement will be replaced by information providers publishing the end-points into the information system.

        • c) <big> WLCG Operational Review </big>
          Speaker: Harry Renshall / Jamie Shiers
        • d) <big> Alice items </big>
        • e) <big> Atlas items </big>

        • f) <big> CMS items </big>
          1. Please have a look at the daily reports given at WLCG daily calls here.
          Speaker: Daniele Bonacorsi
        • g) <big> LHCb items </big>
        • h) <big> WLCG service recommended baseline versions </big>
          The recommended baseline versions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions
      • 3
        OSG Items
        Speaker: Rob Quick (OSG - Indiana University)
        • a) Discussion of open tickets for OSG
          Information taken from the weekly escalation reports.

          OSG did, indeed, re-remind Felipe Silva to answer on ggus #45094. What does one do in such cases? Maybe try to contact him offline, in case all GGUS/OSG ticketing systems' notifications end-up in his spam folder? The submitter still expects an answer.

          There will be a GGUS-OSG tel. meeting, latest proposed date March 12th. Invitation people and background:

          -------- Original Message --------
          Subject: 3rd attempt to fix a date for a OSG-GGUS meeting on  GOCDB vs OIM entries
          Date: Mon, 23 Feb 2009 15:42:12 +0100
          From: Maria Dimou-Zacharova <Maria.Dimou@cern.ch>
          Reply-To: <Maria.Dimou@cern.ch>
          Organization: CERN
          To: Dantong Yu <dtyu@bnl.gov>, Robert Quick <rquick@iupui.edu>, Guenter Grein    <guenter.grein@iwr.fzk.de>, Diana Bosio <Diana.Bosio@cern.ch>, "Ernst, Michael" <mernst@bnl.gov>, Ian Fisk <Ian.Fisk@cern.ch>, Simone Campana    <Simone.Campana@cern.ch>, James Casey <James.Casey@cern.ch>, ggus-info    <ggus-info@cern.ch>, <cms-grid-support@cern.ch>
          CC: Ruth Pordes <ruth@fnal.gov>, Fred Luehring <luehring@indiana.edu>, Arvind Gopu <agopu@indiana.edu>, Kyle Anthony Gross <kagross@indiana.edu>, "Robert w. Gardner" <rwg@hep.uchicago.edu>
          
          Please say if Thursday March 12th at 16hrs CET suits you before I create
          an agenda and book a room for a meeting on FNAL (in March) and BNL (in
          April) exiting GOCDB and the repercussions of this into ATLAS and CMS
          problem reporting via GGUS Direct Site Notification.
          The previous 2 suggested dates didn't work for some of you.
          
          CMS participation is also important.
          
          yours
          maria
          
          >> Maria Dimou-Zacharova wrote:
          >> >     Dear All,
          >> >
          >> > Please observe the section "Sites / Services round table:" in > wlcg operations meetinng notes:
          >> > https://twiki.cern.ch/twiki/bin/view/LCG/WLCGDailyMeetingsWeek090216#Wednesday
          >> >
          >> >
          >> > Please note that the BDII idea was discussed before in
          >> > https://savannah.cern.ch/support/?105819#comment16
          >> > https://savannah.cern.ch/support/index.php?105911#comment1
          >> >
          >> > Then we had a very fruitful meeting in Dec 2008 - notes in
          >> > http://indico.cern.ch/getFile.py/access?resId=0&materialId=minutes&confId=46350 <http://indico.cern.ch/getFile.py/access?resId=0&materialId=minutes&confId=46350>
          >> >
          >> >
          >> > As we are making an effort via https://savannah.cern.ch/support/?106927
          >> > to clear the situation, to have all relevant parties involved and
          >> > informed, I am happy to schedule a new tel. meeting - it seems to be
          >> > needed.
          >> >
          >> > yours
          >> > maria 
          
      • 4
        Review of action items
      • 5
        AOB