WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs:
  • VOs:
  • Recording of the meeting
      • 1
        Feedback on last meeting's minutes
        Minutes
      • 2
        EGEE Items
        • a) <big> Grid-Operator-on-Duty handover </big>
          From: Italy / DECH
          To: UK/I / CE


          Report from Italy COD:
          1. Nothing to report.
          Report from DECH COD:
          1. Nothing to report.
        • b) <big> PPS Report & Issues </big>
          PPS reports were not received from these ROCs:
          AP, IT, SEE


          Issues from EGEE ROCs:
          1. none Reported

            Pilot of WMS at CNAF and CERn in progress. No major issues reported by CMS and Atlas. Next Wednesday the time agreed for the VOs for testing expires. If no problems are reported the SL4 WMS could be released on 29/May
        • c) <big> gLite Release News</big>

          Release News:

          Now in production

          gLite3.1 Update23 released to Production on 16-May
          The update,affecting the lcg-CE contains a new marshal package to fix a security issue found on the CE.
          It is mandatory for sites to upgrade to this version if the improvement packages have been installed on their lcg CE. Those packages were introduced with the gLite 3.1 Update 20.
          All details of the update can be found in:
          http://glite.web.cern.ch/glite/packages/R3.1/updates.asp

          Now in pre-production
          gLite3.0.2 PPS Update49 was released to PPS and is currently being installed at the PPS sites
          The update contains
          • lcg-vomscerts-5.0.0 with new host certificate for the VOMS server vo.racf.bnl.gov
            Affected metapackages
            • lcg-RB
            • glite-SE_classic
            • glite-VOBOX
            • glite-WMS
            • glite-LB
            • glite-WMSLB
          After pre-deployment testing, PPS is now upgrading to gLite3.1.0 PPS Update27
          The update contains:
          • Torque (server, client, MPI_utils) with many enhancements and bug fixes
          • Maui (unchanged with new versioning schema)
          • VOMS Admin (affecting VOMS, UI, VOBOX)
            • Updated voms-admin interface documentation.
            • Deprecated old ACL interface methods.
            • Added VOMS-Admin User's guide
            • Improved voms-admin client online documentation
            • bug fixes
          • bug fixes on VOMS server
            • Enabled log rotation on VOMS/VOMS-admin log files (bug 20607)
            • Enabled setting of proxy timeout via configuration (bug 17247)
            • Enabled usage of voms server hostname (--uri parameter) via configuration
          • New version of lcg-info to support multiple BDII endpoints in LCG_GFAL_INFOSYS
          • yaim core (technically affecting all services) removes the check of unix permission of directory cointaining YAIM configuration files
          • New host certificate for VOMS server vo.racf.bnl.gov; affecting:
            • lcg-CE
            • lcg-CE_torque
            • glite-LFC_mysql
            • glite-LFC_oracle
            • glite-SE_dpm_disk
            • glite-SE_dpm_mysql
            • glite-SE_dpm_oracle
          Details in
          https://twiki.cern.ch/twiki/bin/view/EGEE/PPSReleaseNotes_310_PPS_Update27


          Soon in production

          gLite3.0.2 PPS Update43 in preparation
          The update, to be released next Thursday, contains
          • lcg-vomscerts-5.0.0 with new host certificate for the VOMS server vo.racf.bnl.gov
            Affected metapackages
            • lcg-RB
            • glite-SE_classic
            • glite-VOBOX
            • glite-WMS
            • glite-LB
            • glite-WMSLB
            The following metapackages, now supported with gLite version 3.1, are affected as well if still deployed at some sites in version 3.0:
            • lcg-CE
            • lcg-CE_torque
            • glite-LFC_mysql
            • glite-LFC_oracle
            • glite-SE_dpm_disk
            • glite-SE_dpm_mysql
            • glite-SE_dpm_oracle

          gLite3.1.0 PPS Update24 in preparation
          The update, to be released next Thursday, contains
          • lcg-vomscerts-5.0.0 with new host certificate for the VOMS server vo.racf.bnl.gov
            Affected metapackages
            • lcg-CE
            • lcg-CE_torque
            • glite-LFC_mysql
            • glite-LFC_oracle
            • glite-SE_dpm_disk
            • glite-SE_dpm_mysql
            • glite-SE_dpm_oracle
          • Yaim core and yaim lcg-ce 4.0.4 series - Job Priorities implementation
        • d) <big> EGEE issues coming from ROC reports </big>
          1. No items this week

        • e) <big> URGENT upgrade of CA RPMs </big>
          The EUGridPMA have announced a new set of CA rpms. Upgrade for this release is considered to be *urgent* by the EGEE project. Based on this IGTF release, new CA RPMs have been packaged for EGEE. Please upgrade within 1 day. SAM started a 1 day timeout (including time needed to complete this CA release procedure). When timeout is over, SAM will throw critical errors on CA tests if old CAs are still detected. See the following page for more details about this new EGEE CA release : http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html
      • 3
        WLCG Items
        • a) <big> WLCG issues coming from ROC reports </big>
          1. No items this week. Item
        • b) <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

        • GOG-Singapore would like to decommission their site by June 2, 2008. The hardware and services at the site will be shutdown permanently. Please migrate data that is still needed by your VO before the site is disabled.
          The site currently supports the following VOs: alice, atlas, lhcb, cms, biomed, dteam and ops

        • SARA: On may 21st from 9:00-14:00 CET there will be an outage of srm.grid.sara.nl due to network maintenance. This measure is necessary due to the installation of new storage hardware.

        • CYFRONET-IA64: We are going to shut down CYFRONET-IA64 completely at the end of May 2008.
          Please take care of your data you may have on our classic SE: ares03.cyf-kr.edu.pl.

        • BEIJING-LCG2: Our Dcache SE atlasse01.ihep.ac.cn is planned to be removed from production after 20th May. Please backup your data before that date


          Time at WLCG T0 and T1 sites.

  • c) <big> CCRC'08 Operational Review </big>
    Speaker: Harry Renshall / Jamie Shiers
  • d) <big> Alice report </big>
  • e) <big> Atlas report </big>
  • f) <big> CMS report </big>

    • iCSA/CCRC activities progresses/issues reported by mail, HNs, hard yet to keep them up-to-date in ELOGs also, on a daily basis. But: getting now to a more stable running of iCSA/CCRC tests altogether, so we are catching up in fishing from HNs/mails and filling both ELOGs (setting original dates), needed tickets, and https://twiki.cern.ch/twiki/bin/view/CMS/CCRC08-Phase2-OpsElog (bookmark and check back). Highlighted activities atm: analysis of T1-T1 tests as from last week; extension to non-regional T1-T2 transfer tests; production transfers with latency measurements to prepare for T1 workflows; T1 workflows consisting of (iCSA) re-processing and (CCRC) skimming at T1 sites, esploiting non-custodial areas also; final development on the monitoring side to accomodate feedback from CCRC running.
    Speaker: Daniele Bonacorsi
  • g) <big> LHCb report </big>
  • 4
    OSG Items
    Speaker: Rob Quick (OSG - Indiana University)
    • a) Discussion of open tickets for OSG
      Ticket 33220
  • 5
    Review of action items
    list of actions
  • 6
    AOB