WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

John Shade (CERN)
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    OR click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

    • Monday, 27 July
      • 1
        EGEE Items
        • a) <big>Central Grid-Operator-on-Duty (c-COD) handover</big>
          Form ROC CentralEurope to North East
          • No issues to be reported.
          • All problems from this week were fixed by RODs.
          Regards, Małgorzata Krakowian
        • b) <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:
          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
          1. Nothing this week?
          2. c) <big> gLite Release News</big>
            Please find gLite release news in:
            https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases

            1. Nothing yet.
          3. d) <big> EGEE issues coming from ROC reports </big>
            1. ROC SouthEast:
              It seams that there is a significant problem reported about the WMS proxy renewal daemon at two of our region s WMSs (the wms.ipb.ac.rs and the wms-aegis.ipb.ac.rs). Relative ticket: GGUS:50466 . The ticket is opened since 22/7/2008 and it seams to be untouched till now. Could you please tell us if there is an on going work about this issue?
          4. e) <big>Grid Service Interventions </big>
            Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
            Please consult the URLs above for details.

          5. f) <big> gLite 3.0 VOMS servers - VOMS-client incompatibility on the way! </big>

            The following VOMS servers are running gLite 3.0 versions:
            • https://grid12.lal.in2p3.fr:8443/vomses ( 16 VOs )
            • https://voms.gridpp.ac.uk:8443/vomses ( 20 VOs )
            • https://cagraidsvr10.cs.tcd.ie:8443/vomses ( 11 VOs )
            • https://grids13.eng.it:8443/vomses ( 4 VOs )
            • https://voms.kek.jp:8443/vomses ( 8 VOs )
            • https://voms.grid.sinica.edu.tw:8443/vomses ( 6 VOs )
            • https://glite-io.scai.fraunhofer.de:8443/vomses ( 1 VO )
            • https://voms.ndgf.org:8443/vomses ( 9 VOs )
            • https://skurut19.cesnet.cz:8443/vomses ( 6 VOs )

            NB: This list may not be complete.

            As announced on 3 July*, there is a version of the VOMS-client currently in certification which is incompatible with the 1.7.x (gLite 3.0) versions of the VOMS server.

            These VOMS servers must be upgraded as soon as possible.


            * https://cic.gridops.org/index.php?section=roc&page=broadcastretrievalC&step=2&typeb=C&idbroadcast=41703
          6. g) Legacy gLite versions.
            We have a number of sites running legacy gLite releases, and we should make an effort to follow this up so they upgrade to more recent, supported versions. This is the list:
            • Site Host Version
            • RU-Protvino-IHEP ce0001.m45.ihep.su 2.7.0
            • SPACI-CS-IA64 square.hpcc.unical.it 2.7.0
            • CN-BEIJING-PKU grid04.phy.pku.edu.cn 3.0.2
            • EENet kriit.eenet.ee 3.0.2
            • HK-HKU-CC-01 ce.grid.hku.hk 3.0.2
            • JP-KEK-CRC-01 dg10.cc.kek.jp 3.0.2
            • Taiwan-IPAS-LCG2 atlasce.phys.sinica.edu.tw 3.0.2
            • Taiwan-NCUCC-LCG2 ce.cc.ncu.edu.tw 3.0.2
            • TW-NTCU-HPC-01 host001.hpc.ntcu.edu.tw 3.0.2
            • UKI-LT2-RHUL ce1.pp.rhul.ac.uk 3.0.2
            • UNI-PERUGIA ce.grid.unipg.it 3.0.2
            Please, all ROCs follow the sites in your region and support them to proceed with the upgrade.

            I think a deadline of 1 month (end of August) is a realistic one, as we are in vacation period. After that, I would suggest that those which have not upgraded become suspended/uncertified till they do so.

            These numbers are obtained via SAM by executing glite-version on your WNs.

          7. h) Very old WN installations out there.
            The SAM CE tests include the replication of a file from the site's default SE for "ops" to a central SE, for which a DPM node at CERN is the default choice and thereby almost always used. This node still runs SLC3, which is no longer supported and must be upgraded. Testing on SLC4 with the latest DPM version for gLite 3.1 shows that if the switch were made today, approximately 10% of Production sites would fail the CE tests due to old versions of GFAL and lcg-utils running on their Worker Nodes! Sites in Italy and UKI are particularly affected.

            For example, RAL has versions of lcg-utils that are over a year old. In the meantime, there have been 18 updates, including 3 marked high priority!

            Therefore, this is a call to all sites that are failing the CE tests in the SAM Validation instance to upgrade their WNs to the latest version ASAP. We would like to migrate the SAM DPM production instance in ~3 weeks from now (by Monday 10th of August).

            So, please, all ROCs, follow this up with your sites - if need be, by opening GGUS tickets. Thanks in advance and best regards,

            There are at least these 45 problematic CEs spread over 36 sites (ordered by ROC):


            • HK-HKU-CC-01 ce.grid.hku.hk
            • JP-KEK-CRC-01 dg10.cc.kek.jp
            • TW-NTCU-HPC-01 host001.hpc.ntcu.edu.tw
            • Taiwan-IPAS-LCG2 atlasce.phys.sinica.edu.tw
            • TORONTO-LCG2 bigmac-lcg-ce2.physics.utoronto.ca
            • CYBERSAR-CAGLIARI ce-cyb.ca.infn.it
            • ESA-ESRIN grid-eo-engine04.esrin.esa.int
            • INFN-BARI gridba2.ba.infn.it
            • INFN-GENOVA grid01.ge.infn.it
            • INFN-MILANO t2-ce-01.mi.infn.it
            • INFN-NAPOLI griditce01.na.infn.it
            • INFN-PISA gridce.pi.infn.it
            • INFN-PISA gridce1.pi.infn.it
            • INFN-PISA gridce2.pi.infn.it
            • INFN-ROMA1-CMS cmsrm-ce01.roma1.infn.it
            • SPACI-CS-IA64 square.hpcc.unical.it
            • UNI-PERUGIA ce.grid.unipg.it
            • BelGrid-UCL ingrid.cism.ucl.ac.be
            • EENet kriit.eenet.ee
            • RU-Protvino-IHEP ce0001.m45.ihep.su
            • RU-Protvino-IHEP ce0003.m45.ihep.su
            • IL-BGU cs-grid1.bgu.ac.il
            • CESGA-EGEE ce.egee.cesga.es
            • CESGA-EGEE ce2.egee.cesga.es
            • UNICAN ce01.macc.unican.es
            • e-ca-iaa e-ce.iaa.es
            • RAL-LCG2 lcgce02.gridpp.rl.ac.uk
            • RAL-LCG2 lcgce03.gridpp.rl.ac.uk
            • RAL-LCG2 lcgce04.gridpp.rl.ac.uk
            • RAL-LCG2 lcgce05.gridpp.rl.ac.uk
            • UKI-LT2-IC-HEP ce00.hep.ph.ic.ac.uk
            • UKI-LT2-RHUL ce1.pp.rhul.ac.uk
            • UKI-LT2-UCL-HEP lcg-ce01.hep.ucl.ac.uk
            • UKI-NORTHGRID-MAN-HEP ce01.tier2.hep.manchester.ac.uk
            • UKI-NORTHGRID-MAN-HEP ce02.tier2.hep.manchester.ac.uk
            • UKI-SCOTGRID-ECDF ce.glite.ecdf.ed.ac.uk
            • UKI-SCOTGRID-ECDF mw05.ecdf.ed.ac.uk

              There could be others that are currently down. Production results: Productin Results

              Validation results: Validation Results

        • 2
          OSG Items
          Speakers: Maria Dimou, Rob Quick
        • 3
          Review of Action Items
        • 4
          AOB