Minutes for Regional Operations Meeting DECH (September 8th, 2006)

Attendance:
Christian Peter (ITWM)
Andreas Haupt (DESY-ZN)
Torsten Antoni, Clemens Koerdt, Sven Hermann <chair> (FZK)
Ute Kabarek (SCAI)
Andreas Gellrich (DESY-HH)

Apologies:
CSCS (report received by Peter Kunszt)

Missing:
(GSI)
(RWTH-Aachen)
(MPPMU)
(Uni Freiburg)
(Uni Dortmund)
(Uni Karlsruhe)
(Uni Wuppertal)


1. Introduction
(announcements and open issues, also from recent meetings)

- gLite updates service based from now on (updates can be voluntary or mandatory)
- Quattor installation problems
  o complete reinstallation necessary
  o SA3 never tests without new reconfiguration (be aware)
- Security updates are announced to be installable on the fly
- new FCR-Tool (Freedom Of Choice) for VOs released, where VOs can choose sites, criteria for job submission ... available now
- OPS-VO: Coming next Monday: three sites with open tickets
- Dech VO:
  o Formal introduction finished? Website is still being worked on (CIC registration is done)
  o SFT Server is in progress to be available within two weeks using DECH VO
  o Status Poster? (Horst, no information -> Ute asks offline)
  o /bin/hostname on all regional sites? -> formalise: 1. 'How-to' to be sent around,  2. test sites 3. open tickets by COD DECH (!# SCAI)  [VOMS functionality only DESY/SCAI (SCAI also wants more glite job submit), VOMS key downloadable from CIC portal]
- Communication channels: still in progress, delayed due to vacation period
- VOs: Clemens sends list around (!#)
- Assess grid: So far FZK, CSCS -> Ganglia would be nice (link is probably sufficient) (!#ALL: Please participate!)
- EGEE:
  o Conference -- Oversight SA1 ROC -- Asking for status - please give us material
  o ROC-Managers: EGEE2: Partner Monitoring Review: report to be prepared, review to be prepared for January, (!# Please help us in preparing this)
  o Still some discussion on individual tasks necessary


2. Round the sites (Status)

CSCS
----
- Business as usual
- Security update soon
- Roc on duty comments see below

DESY-HH
----
- Not much news
- hardware updates
- some time ago problems with RB (MySQL is growing >4GB)
  o How to do this garbage collection safely, rollout discussions
     SH: RB MySQL was discussed consequently some time ago in operations meeting, Action Point was opened: Developers should give solution,  Feedback by Sites: Solution to just manually delete data (preliminary solution)
  o Second point concerning maintaining services in a controlled and safe way will be put forward in the next operations meeting (done S.H.)
    AH: Storage element is particularly critical (loss of data) names in file catalogues connected to specific machines (replicas look as if still there) - old problem is coming back from time to time

DESY-ZN
-----
- Nothing in particular
- rollout of last update done last week
- no major problems encountered

FZK
----
- problems with gridftp doors, not stable, high load -> still under investigation
- some open tickets - main admin was sick
- business as usual (high load -- several thousand jobs actually queued)

ITWM
----
- business as usual
- cluster load good
- next week: small OS update, security, Storage 4TB Memory in addition
- status PPS: Still hardware missing: pending, will come back to request soon.

SCAI
----
- dns server update has led so some alias problems: please give feedback if applicable
- automatic upgrades: 3.0.2, Security included
- DAG job submission problems with 3.0.2: Ticket has been opened


3. Status COD DECH
- official duty shift as lead team
  o status grid: some problems with rm-failures in combination with timeouts: slow BDII, Network?
  o some obstacles during shift due to crashed COD web portal and missing SFTs
  o migration of monitoring framework from SFT to SAM to be finalised next week
  o COD-DECH will present feedback on COD task in next official COD meeting


4. TPM
- planned
- see transparencies: on status (November training 6-7.9, then first shift as backup, then shift as lead team the following week, GSI 6PM, CSCS 2PM)


5. ROC- on duty
- Handover: CSCS: business as usual (SH: ops tickets become more important)
- Please give feedback about published workflow (!# ALL)


6. AOB
---
(none)



-- 
Dr. Sven Hermann sven.hermann@iwr.fzk.de
Forschungszentrum Karlsruhe Tel.: +49-7247-828632
Institute for Scientific Computing / Inst. f. Wissenschaftliches Rechnen
Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany