WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray, Steve Traylen (CERN)
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0148141

    OR click HERE
    (Please specify your name & affiliation in the web-interface)

    Click here for minutes of all meetings

    Click here for the List of Actions

      • 1
        EGEE Items
        • a) <big> Grid-Operator-on-Duty handover </big>
          From: SouthEast and Russia
          To: CERN and Germany, Switzerland


          Nothing to report from Germany or Switzerland.
        • b) <big> PPS Report & Issues </big>
          Please find Issues from EGEE ROCs and general info in:

          https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
        • c) <big> gLite Release News</big>
        • d) <big> EGEE issues coming from ROC reports </big>
          • CentralEurope: There is a problem with CE RM test which fails in case the site SE is in downtime. Central Europe ROC recommended site administrators to set SE and CE in downtime in case of need to put SE in downtime. Additionally CE-only sites should setup some other siteís SE as a close one and an agreement with site administrator of the SE owner site is needed. We want to ask does other ROCs have similar problems and how they solved it?

          • Germany, Switzerland: UNI-BONN has problems with their APEL publisher. On 2009-01-13 Robert Zimmermann (UNI-BONN) raised a ticket and asked for help from the R-GMA experts. But until today no one of the R-GMA support unit reacted on that ticket: GGUS:45231. On 2009-01-19 then UNI-BONN got a ticket for failing APEL tests (GGUS no GGUS:45405).

            Comment from me (Steve), ticket was assigned to wrong group really but not obvious. RAL-LCG2 can correct the situation.

          • Italy: Feedback about the new bdii release (ref. GGUS:43230, Savannah PATCH:2671).

            Before the update, we experienced random error messages by nagios: Could not search/find objectclasses in mds-vo-name=local,o=grid and by SAM tests: egee-bdii.cnaf.infn.it:2170: ERROR: Internal (implementation specific) error lcg_gt: Invalid argument)

            All italian top-bdii instances were updated on 15th Jan. After the update, both error messages have been disappeared.

          • Russia: GGUS:45333 was assigned to R-GMA team from 15 of January without any respond. As a result the BY-NCPHEP site can not operate properly.

            Comments from chair (Steve) same situation as BONN site above. Have now reassigned to RAL-LCG2 for resolution. But generally I will contact R-GMA folks since they should have both on.

          • SouthEastern: I got yet another report regarding the BDII stability issue.

            Comment from chair (Steve). Please provide more information.

          • SouthWestern: LIP complains that ATLAS is using 5 GB on the Workenodes /tmp directory. LIP s WN have 8 cores and most of the disk space is dedicated to the /home directories. Maybe it would be useful to know for all the LHC VOs the disk requiremets for /home and TMPDIR (the scratch space) pero job

            Comment from chair (Steve) see the CIC portal VO cards. They contain exactly this information. Clearly if it does not reflect your observations raise a GGUS ticket.

        • e) <big>SAM</big>
          A new version of the SAM portal is available on the Validation server at:

          https://sam-val.cern.ch:8443/sam-gw/sam.py

          The portal solves the broken history display of previous versions by invoking the corresponding GridView pages. The Portal at the above URL points to the Production DB, since GridView has no Validation setup. Users are encouraged to have a look and give any feedback to judit.novak at cern.

        • f) <big>Operational Security</big>
          1. SecOp at Beijing
          2. Change csirt email address
          3. CAs installation set.
      • 2
        WLCG Items
      • 3
        OSG Items
        Speaker: Rob Quick (OSG - Indiana University)
        • a) Discussion of open tickets for OSG
      • 4
        Review of action items
      • 5
        AOB