WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    OR click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

      • 16:01 16:30
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: Italy and France
          To: Russia and UK/I


          Report from Italy :
          • List of unresponsive sites (escalated to political instances)
            • First Ops meeting (OCC involved): no sites to report
            • Second Ops meeting (assigned to OCC):
              • SITE NAME: SDU-LCG2 (ROC CERN); GGUS: 45181; Reason: no response from site admin in the last week.

          • Problems Encountered during shift:
            Nothing to report

          • Information for the new COD team:
            Nothing to report.

          Report from France :
          • No report.
        • <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps

          SUMMARY:

          • Definition of "early adopters" of gLite releases (Staged roll-out). Still need several services covered and looking for volunteers.
            More info about the release testing (early adoption) process and the relevant interfaces can be read at: https://twiki.cern.ch/twiki/bin/view/LCG/PPS_Release_Testing List o fServices not covered
            The list of services for which volunteers are needed is:
            1. glite-WN (plain and re-locatable)
            2. glite-UI (plai and re-locatable)
            3. glite-TORQUE_client
            4. glite-TORQUE_server
            5. glite-TORQUE_utils
            6. glite-CONDOR_utils
            7. glite-LSF_utils
            8. glite-SGE_utils
            9. glite-MON
            10. glite-SE_dpm_disk
            11. glite-MON (registry)
            12. glite-MPI_utils
            13. glite-FTA_oracle
            14. glite-FTM
            15. glite-FTS_oracle
            16. glite-SE_dcache_admin_gdbm
            17. glite-SE_dcache_admin_postgres
            18. glite-SE_dcache_info
            19. glite-SE_dcache_pool
            20. glite-LFC_mysql
            21. glite-LFC_oracle
            22. glite-WMS
            23. glite-LB
            24. glite-CREAM_ce
            25. glite-SE_dpm_mysql
            26. glite-PX
          • Pilot service of glexec/SCAS started
            • kick-off meeting with sites and experiments concerned held on the 5th
            • Minutes in http://indico.cern.ch/conferenceDisplay.py?confId=49840
            • the controlled roll-out of the glexec/SCAS functionality over two T1 sites was decided
            • FZK (Karlsruhe) will start the installation on the 9th-Feb
            • After a first phase of testing by LHCb and Atlas, IN2P3(Lyon) will step in
            • Details about the pilot (planning, layout, technical info) can be found in the page https://twiki.cern.ch/twiki/bin/view/LCG/PpsPilotSCAS
            • Details about the single tasks can be found in the tracker http://www.cern.ch/pps/index.php?dir=./ActivityManagementSA1DeploymentTaskTracking specifically listing the subtasks of TASK:8986
        • <big> gLite Release News</big>
          Please find gLite release news in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases

          Now in Production
          4th Feb: gLite 3.1 Update 40 and of gLite 3.0 Update45 were released to production. The updats contain an upgrade of lcg-vomscerts-5.3.0. They add 3 new host certificates:

          • cclcgvomsli01.in2p3.fr (biomed + egeode);
          • next cert for vo.racf.bnl.gov (atlas);
          • cert for voms.fnal.gov (cms).
          Release notes in http://glite.web.cern.ch/glite/packages/R3.1/updates.asp and http://glite.web.cern.ch/glite/packages/R3.1/x86_64/updates.asp

          Now in PPS
          3rd Feb: gLite 3.1 PPS Update 43 went through the PPS deployment test and is now been installed by the remaining PPS sites. The update contains:

          • WMS 3.1.102 fixing WMS 3.1.100 already in PPS (PATCH:2562)
          • Upgrade of lcg-vomscerts-5.3.0. (already deployed in production) (PATCH:2745 and PATCH:2746)
          • Bugs fixes for WMS UI 3.1 (PATCH:2622)
          • WN: grid-cm-* packages provide worker node configuration monitoring published on the Active MQ messaging system (PATCH:2660 PATCH:2661)
          • Upgrade of BDII. The starting cache size used for the Berkeley Database in the BDII has been reduced from 1 GB to 50 MB. This should significantly reduce the memory footprint and still provide the necessary performance. (PATCH:2671)
          • Dependency on mysql-server added to VOMS_mysql (PATCH:2700)
          • New Information Dynamic Plugin and SGE yaim utils fix a vulnerability (http://www.gridpp.ac.uk/gsvg/advisories/advisory-43233.txt)
          Release notes in: https://twiki.cern.ch/twiki/bin/view/EGEE/PPSReleaseNotes_310_PPS_Update43 . Deployment test reports in: http://www.cern.ch/pps/index.php?dir=./release/testreports/gLite3.1.0/gLite3.1.0-PPS-UPDATE43/

          Soon in Production
          Nothing to report.

        • <big> EGEE issues coming from ROC reports </big>
          • Italy: [FOR INFORMATION] INFN-T1 FTM endpoint (FTM endpoint at CNAF): http://tier1.cnaf.infn.it/ftmmonitor/transfer-monitor-report/
            We also use the FTS monitor tool developed by in2p3, available at: http://tier1.cnaf.infn.it/ftsmonitor/

        • <big>Grid Service Interventions </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board

          Many interventions scheduled this week. Please consult the URLs above for details.
      • 16:30 17:00
        WLCG Items 30m
      • 17:00 17:30
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
          Information taken from the weekly escalation reports.

          A reminder was sent to GGUS developer for progress on GGUS-OSG ticket update flow testing based on ggus #45488
          Rob or Kyle or other OSG supporter to re-prompt Felipe Silva to answer on ggus #45094. The submitter doesnot accept the suggestion to close the ticket.

      • 17:30 17:35
        Review of action items 5m
      • 17:35 17:35
        AOB