WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0140768

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: Russia; SouthWest Europe
  • VOs: Alice; ATLAS; BioMed; CMS
  • Recording of the meeting
      • 4:00 PM 4:00 PM
        Feedback on last meeting's minutes
        Minutes
      • 4:01 PM 4:30 PM
        EGEE Items 29m
        • <big> Grid-Operator-on-Duty handover </big>
          From: SW Europe / Russia
          To: France / UK/I


          Issues:
          None in the reports.
        • <big> PPS Report & Issues </big>
          PPS reports were not received from these ROCs:
          AP, CERN, IT, NE, RU, SWE

          Issues from EGEE ROCs:
          1. None received

          Release News:
          1. gLite 3.1.0 PPS Update18 was released to pre-production on Wednesday:
            It has passed the pre-deployment test and it is currently being deployed to the full PPS:
            The update contains:
            • 64bit versions of SE_dpm_mysql/_disk
            • dcache now installs with yum install (not groupinstall)
            • bdii v. 3.9.1-5
            • yaim-core update
            • support for SGE CEs
            As there is an update of YAIM core *all* metapackages are reported as affected by this update
            More detailed info at:
            https://twiki.cern.ch/twiki/bin/view/EGEE/PPSReleaseNotes
        • <big> EGEE issues coming from ROC reports </big>
          1. (ROC CE): There is a GGUS ticket from Central European site related to inconsistencies in DPM assigned to ROC CE. Some files are in DPM DB but not on the disk. We would suggest to remove files manually from the DPM DB to clear inconsistencies, but we are not sure if we should inform VO(s) about such changes and what is the procedure in case files are lost? For reference, the ticket link: https://gus.fzk.de/ws/ticket_info.php?ticket=33012
          2. (ROC CE): WARSAW-EGEE site experiences problems similar to those of FNAL which resulted in action 101. WARSAW-EGEE is interested if there is any progress on the issue. They have in place standing reservation for OPS VO jobs which however did not help to avoid problems with RM test timeouts.
          3. (ROC SEE): While working on registration of new Serbian EGEE site (AEGIS07-PHY-ATLAS), we encountered the following problem in GOCDB, which does not recognize existing country reps for newly created sites: https://gus.fzk.de/pages/ticket_details.php?ticket=32910
        • <big> gLite Release News</big>
          1. release of gLite 3.1 Update14 to production in preparation. The update, to be released very soon (within Wednesday), will contain:
            • YAIM module to configure LCG CE and gLite WN for MPI support according to the guidelines from the EGEE TCG working group on MPI
            • Additional MAUI package (better support for the split of CE from Torque server)
            • Improved globus-gridftp startup script
            • lcg_util v1.6.8
            • Improvements to glite-info-provider-ldap
            • glite-yaim-core 4.0.3-13 for gLite 3.1
            As there is an update of YAIM core *all* metapackages are reported as affected by this update. Actually the yaim-core was changed sue to an incompatibility found in PPS with the released version of the glite-info-provider-ldap, so the list of impacted services can be restricted to those concerned by the new version of glite-info-provider
          2. Release of gLite 3.0 Update40 to production in preparation. The update, to be released very soon, will contain:
            • YAIM module for 3.0 WMS to fix the bug of limit on uid for gridftp server
        • <big>64 bit update </big>
          Speaker: Oliver Keeble (CERN)
          more information
        • <big>Co-installation of mw services on the same box: Known issues?
          Speaker: Oliver Keeble
        • <big>Support for LDAP based VOs in YAIM (to be removed)
          Speaker: Oliver Keeble
        • <big>YAIM exit codes</big> 5m
          We r about to implement YAIM exit codes, error codes, and associated error messages, in order to help a bit the interaction between YAIM and fabric management systems. In principle it will conform with /usr/include/sysexits.h and some additional own exit code all below 126. If you are using some tool (ex. quattor) which could make use of this feature, and/or you have advice/opinion how you would like to see this to be implemented, please shout now ! And instead of flooding Rollout please send it to yaim-contact@cern.ch thx, Gergo
          Speaker: Gergely Debreczeni (KFKI Research Institute for Particle and Nuclear Physics)
          Slides
      • 4:30 PM 5:00 PM
        WLCG Items 30m
        • <big> WLCG issues coming from ROC reports </big>
        • <big>WLCG Service Interventions (with dates / times where known) </big>
          Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
          1. RAL: [Has this happened?] Owing to a potential fire hazard identified at RAL will have to schedule some emergency downtime very soon. I do not yet have exact details of when and for how long, but it is likely to be at least 1 whole (work) day intervention.

          Time at WLCG T0 and T1 sites.

        • <big>FTS service review</big> 5m

          Please read the report linked to the agenda.
          In particular ?

          Speakers: Gavin McCance (CERN), Steve Traylen
        • <big> ATLAS service </big>
        • <big>CMS service</big>
          • Nothing to report.
          Speaker: Mr Daniele Bonacorsi (CNAF-INFN BOLOGNA, ITALY)
        • <big> LHCb service </big>
          • dCache problem at IN2P3 (GGUS-Ticket 33017)
          • dCache problem at GRIDKA (GGUS-Ticket 33019)
          • need to define clear procedures for site in SD
          Speaker: Dr roberto santinelli (CERN/IT/GD)
        • <big> ALICE service </big>
          • Nothing to report.
          Speaker: Dr Patricia Mendez Lorenzo (CERN IT/GD)
        • <big> CCRC'08 Operational Review
          • Item 1
          Speaker: Harry Renshall / Jamie Shiers
          Minutes of CCRC08 meetings
      • 5:00 PM 5:30 PM
        OSG Items 30m
        Speaker: Rob Quick (OSG - Indiana University)
        • Discussion of open tickets for OSG
          Escalation Reports
      • 5:30 PM 5:35 PM
        Review of action items 5m
        list of actions
      • 5:35 PM 5:35 PM
        AOB