Regional Operations Meeting DECH

Europe/Zurich
FZK

FZK

Sven Hermann
Description
Telephone conference. Details for connecting to the conference were distributed through the regional mailing list.
<HTML> <HEAD> <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=utf-8"> <TITLE></TITLE> <META NAME="GENERATOR" CONTENT="OpenOffice.org 2.0 (Linux)"> <META NAME="CREATED" CONTENT="20070601;13572000"> <META NAME="CHANGED" CONTENT="20070605;11092200"> <STYLE TYPE="text/css"> </STYLE> </HEAD> <BODY LANG="en-US" TEXT="#000000" DIR="LTR">

Minutes for Regional Operations Meeting DECH (June 1st, 2007)

Attendance:
Sven Hermann, Clemens Koerdt, Günter Grein (FZK)
Renate Dohmen (MPPMU)
Christian Peter (ITWM)
Stephan Nies (Uni Dortmund)
Andreas Haupt (DESY-ZN)
Alessandro Usai (CSCS)
Christoph Wissing, Uwe Ensslin, Yves Kemp (DESY-HH)
Horst Schwichtenberg (SCAI)


Apologies:
(Uni Wuppertal)

Missing:
(GSI)
(Uni Karlsruhe)
(Uni Siegen)
(RWTH-Aachen)
(Uni Freiburg)



1. Introduction

  • Last meetings minutes: no comments

  • Announcements:

    • change of DECH Wiki url: now available via http (instead of https)
      http://twiki.cscs.ch/twiki/bin/view/DECH/WebHome

    • gstat statistics: Some info received, follow up in more detail (wrong numbers for some sites with more then one CE, #C.K.)

    • EGEE production sites email channel was disturbed, seems to be solved now.
      Apologies.

    • If you want to write to all EGEE production sites, you have to get yourself registered first with your personal Savannah account. (https://savannah.fzk.de/projects/egee-production/)

    • phasing out classic SE :(see mail)
      Are there any features you as site admins would miss? (#all)

    • software release cycle: Sven is collecting feedback on this (mail forwarded by Sven on 31.5. to [egee-production_sites] “Poll: [Fwd: Software release updates cycles]”

    • Current requirements for Operation systems in our region -> added a link to the agenda page. Feedback welcome! Will be forwarded to the TCG soon! Send corrections if necessary.

    • using DECH VO for Gridka school (temporary certs / use with VOMS server)

      • there are some security considerations.

      • partners (DESY, SCAI, after the meeting also Wupppertal) strongly opted for use of Dech VO for training, instead of creating yet another VO for that purpose.

      • All Dech sites could agree to accept temporary certificates for a limited use case.

      • still issues to clarify. Discussions at FZK ongoing, esp. with security people

    • you can find links of the last three ops meetings on the agenda page

    • Reminder: voms cert has changed (still an issue..)

    • top level BDIIs stress test (see link to ops meeting 21.5.)

    • production release 24:

      • yaim, sl4comp.WN (Attention: difficult upgrade path to natively compiled package later and therefor not recommended for small sites)

    • production release 25

      • DGAS, lcg-CE, RGMA server bugfixes...

    • changes of pool account mapping introduced with new yaim (two tickets #22244,#22306 opened on that, as well as discussions in the rollout list)

    • OS flag (GlueVariable “OS” to specify OS for jobs)

    • PPS 29 – glite 3.1 WN on SL4

    • PPS 30 – canceled

    • PPS 31 – just been released. predeployment test report (see link on agenda page)

    • new release schedule proposed (every two weeks)

      • patches go out on Mondays, stay in for three-four weeks and then go to production if no issues were found

    • EGEE III preparations ongoing. All Partners sent their requests. Negotiations on European level to come.

  • Action Items:

    • PPS certification (get info system and SAM tests working: #! GSI, ITWM)
      -> GSI partly passing the SAM tests. Next step would be to sustain positive testing for a week. ITWM delayed due to higher priorities at the moment:
      replica of the GOC db web server up and running : goc2.gridops.org
      after stable version of this is achieved, go back to PPS

    • Dech VO: Now only recently certified sites LRZ and Uni Siegen not supporting Dech VO. (#! Siegen, LRZ, C.K.)
      -> Siegen started, though still JS problems, LRZ no progress visible

    • Create a criteria catalog to classify MW update in "major" and "minor" ones to warn site admins regionally in advance, to better advise smaller sites
      -> no feedback received. rough classification exists on DECH wiki. Feedback still welcome. Agreed for now. CLOSED



2. Round the Sites

  • CSCS

    • updated dCache with recent patches

    • experiencing timeout problems in connection with replica management

    • rate of transfer not acceptable at the moment

      • might be that the source (Karlsruhe) has a problem...?

      • huge improvement last week, but then deterioration....
        SH: might be useful to open a GGUS ticket on this problem.

    • recent TPM shift without notable problems

---

  • DESY-HH

    • PPS

      • running two WN in compatible mode

      • Quattor scripts for native WN package in progress

      • involved in SRM 2.2 testing

      • new PPS release not yet deployed. Hope it will not again destroy our gLite CE as usually

    • Production

      • struggling with changes in deployment for pool accounts

      • fifth RB soon online, too much load requires manual interventions

      • Each RB handling dedicated VOs.

      • still having problems with VOMS mapping

      • anybody using VOMS mapping extensively? Feedback welcome!

      • opened a ticket on VOMS: see #21641

---

  • DESY-ZN

    • upgraded to dCache 1.7 with help of developers

    • Since Wednesday running again

---

  • FZK

    • first half of May without troubles.

    • second half of May more problematic:

    • first difficulties with SRM and hanging gridftp doors (all effort is consumed in just maintenance of service, if effort unavailable service is quickly degrading)

    • (new SRM certificate test was testing on gridftp port and thus failed. Issue was known to SAM team since end of last year, but was ignored.)

    • The SE via the SRM became inoperative several times. Reason for failure: hanging processes with memory errors in combination with high load. Under investigation.

    • CEs started to lock up recently (I/O processes too slow for the high number of jobs). Procedure to detect and fix this particular failure class is being developed.

    • dCache 1.8beta (SRM 2.2) in PPS

    • Again problems with APEL RPMs after changing specInt value. Some manual hacks were necessary.... Still in contact with Dave Kant...

---

  • GSI
    (No information received)

---

  • ITWM

    • involved in gocdb3 web portal replica

    • production: currently in maintenance

    • installing latest updates

    • LDAP server on CE became unstable recently, reason unknown..?

---

  • LRZ
    (No information received)

---

  • MPPMU

    • glite-job-submit on SGE problem is now solved with help from LRZ

    • some changes to the job manager did the trick

    • VOMS server problem (see ticket DECH#1905, solved at time of writing)

---

  • RWTH
    (No information received)

---

  • SCAI

    • newly employed person: Daniel Rubin(?)

    • TPM shift was normal

    • WMS: still problems with this service (processes are hanging, well known to developers...)

---

  • Uni Dortmund

    • new site admin: Stefan Nies, will represent Dortmund in this meeting

---

  • Uni Freiburg
    (No information received)

---

  • Uni Karlsruhe

    • Yves: short status: still in SD

    • currently problems with not published information.. (ticket opened with that)

    • Gregory Schott in charge of the cluster now.

---

  • Uni Siegen
    (No information received)

---

  • Uni Wuppertal
    (No information received)

---





3. Feedback to the TCG



4. COD

  • Next COD shift will be during the week in Stockholm of 11.6.-18.6. (three COD DECH people attending).

  • Will have an internal COD meeting the week after. Proposal is 21.6. (Thursday)

  • Currently huge problems with SAM portal: missing tests, LFC errors, etc., ...

  • Still lots of useless error messages. (expect discussions in forthcoming COD meeting)

  • New host-cert tests caused some controversial discussions about procedure to apply before making them critical.

  • Signal to Noise in official communication channels like EGEE broadcast seems too low now. Some improvements are currently discussed at the CIC team.

  • Working group of TIP

  • Encouragement to site admins to improve/extend GOC Wiki, which is a community effort...







5. ROC-On-Duty

  • Handover GSI to CSCS

    • GSI not present in meeting. CSCS agrees to contact them offline

    • Any problematic tickets?

      • YK: #21600 -> UI installation on file system where you are not root (voms-proxy-init). Got no feedback from developers. Better to use rollout list?!?

      • SH: contacting developers is always problematic. But posting on rollout list is not enough. GGUS tickets, where the support unit becomes not active should be reported in your weekly site reports. They can then be escalated to Monday's operations meetings.



6. AOB

Reminder: Operations workshop in Stockholm, 13-15th June, agenda available:
http://indico.cern.ch/conferenceTimeTable.py?confId=12807


</BODY> </HTML>
There are minutes attached to this event. Show them.
    • 15:00 15:05
      Introduction 5m
      Announcements - Comments on last meetings minutes - WLCG/EGEE Operations Meeting 7.5. agenda - WLCG/EGEE Operations Meeting 14.5. agenda - WLCG/EGEE Operations Meeting 21.5. agenda - Reminder: Operations Workshop Stockholm link - Production update 24 and 25 ( link ) - PPS updates 29, 31(today) (mail "Release of new update to PPS: PPS-UPDATE nn" to project-eu-egee-pre-production-service@cern.ch ) link - New future release schedule for PPS ( link ) - EGEE3 preparation Action list
      Speaker: Sven Hermann
      more information
    • 15:05 15:40
      Round the sites 35m
      Speaker: ALL
      • CSCS
      • DESY-HH
      • DESY-ZN
      • FZK
      • GSI
      • ITWM
      • LRZ
      • MPPMU
      • RWTH Aachen
      • SCAI
      • Uni Dortmund
      • Uni Freiburg
      • Uni Karlsruhe
      • Uni Siegen
      • Uni Wuppertal
    • 15:40 15:45
      Feedback to the TCG 5m
      Speaker: Horst Schwichtenberg
    • 15:45 15:50
      Status COD DECH 5m
      - Organisation - New and ongoing discussions - Report about shift (if applicable)
      Speaker: Clemens Koerdt
    • 15:50 15:55
      Status ROC-DECH-On-Duty-Support 5m
      Speaker: GSI -> CSCS
    • 15:55 16:00
      AOB 5m