Minutes for Regional Operations Meeting DECH (September 8th, 2006)
Attendance:
Christian Peter (ITWM)
Andreas Haupt (DESY-ZN)
Torsten Antoni, Clemens Koerdt, Sven Hermann <chair> (FZK)
Ute Kabarek (SCAI)
Andreas Gellrich (DESY-HH)
Apologies:
CSCS (report received by Peter Kunszt)
Missing:
(GSI)
(RWTH-Aachen)
(MPPMU)
(Uni Freiburg)
(Uni Dortmund)
(Uni Karlsruhe)
(Uni Wuppertal)
1. Introduction
(announcements and open issues, also from recent meetings)
- gLite updates service based from now on (updates can be voluntary or
mandatory)
- Quattor installation problems
o complete reinstallation necessary
o SA3 never tests without new reconfiguration (be aware)
- Security updates are announced to be installable on the fly
- new FCR-Tool (Freedom Of Choice) for VOs released, where VOs can
choose sites, criteria for job submission ... available now
- OPS-VO: Coming next Monday: three sites with open tickets
- Dech VO:
o Formal introduction finished? Website is still being worked on
(CIC registration is done)
o SFT Server is in progress to be available within two weeks
using DECH VO
o Status Poster? (Horst, no information -> Ute asks offline)
o /bin/hostname on all regional sites? -> formalise: 1.
'How-to' to be sent around, 2. test sites 3. open tickets by COD
DECH (!# SCAI) [VOMS functionality only DESY/SCAI (SCAI also
wants more glite job submit), VOMS key downloadable from CIC portal]
- Communication channels: still in progress, delayed due to vacation
period
- VOs: Clemens sends list around (!#)
- Assess grid: So far FZK, CSCS -> Ganglia would be nice (link is
probably sufficient) (!#ALL: Please participate!)
- EGEE:
o Conference -- Oversight SA1 ROC -- Asking for status - please
give us material
o ROC-Managers: EGEE2: Partner Monitoring Review: report to be
prepared, review to be prepared for January, (!# Please help us in
preparing this)
o Still some discussion on individual tasks necessary
2. Round the sites (Status)
CSCS
----
- Business as usual
- Security update soon
- Roc on duty comments see below
DESY-HH
----
- Not much news
- hardware updates
- some time ago problems with RB (MySQL is growing >4GB)
o How to do this garbage collection safely, rollout discussions
SH: RB MySQL was discussed consequently some
time ago in operations meeting, Action Point was opened: Developers
should give solution, Feedback by Sites: Solution to just
manually delete data (preliminary solution)
o Second point concerning maintaining services in a controlled
and safe way will be put forward in the next operations meeting (done
S.H.)
AH: Storage element is particularly critical (loss
of data) names in file catalogues connected to specific machines
(replicas look as if still there) - old problem is coming back from
time to time
DESY-ZN
-----
- Nothing in particular
- rollout of last update done last week
- no major problems encountered
FZK
----
- problems with gridftp doors, not stable, high load -> still under
investigation
- some open tickets - main admin was sick
- business as usual (high load -- several thousand jobs actually queued)
ITWM
----
- business as usual
- cluster load good
- next week: small OS update, security, Storage 4TB Memory in addition
- status PPS: Still hardware missing: pending, will come back to
request soon.
SCAI
----
- dns server update has led so some alias problems: please give
feedback if applicable
- automatic upgrades: 3.0.2, Security included
- DAG job submission problems with 3.0.2: Ticket has been opened
3. Status COD DECH
- official duty shift as lead team
o status grid: some problems with rm-failures in combination
with timeouts: slow BDII, Network?
o some obstacles during shift due to crashed COD web portal and
missing SFTs
o migration of monitoring framework from SFT to SAM to be
finalised next week
o COD-DECH will present feedback on COD task in next official
COD meeting
4. TPM
- planned
- see transparencies: on status (November training 6-7.9, then first
shift as backup, then shift as lead team the following week, GSI 6PM,
CSCS 2PM)
5. ROC- on duty
- Handover: CSCS: business as usual (SH: ops tickets become more
important)
- Please give feedback about published workflow (!# ALL)
6. AOB
---
(none)
--
Dr. Sven Hermann sven.hermann@iwr.fzk.de
Forschungszentrum Karlsruhe Tel.: +49-7247-828632
Institute for Scientific Computing / Inst. f. Wissenschaftliches Rechnen
Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany