Minutes for Regional Operations Meeting DECH (July, 28th 2006)
Attendance*:
Peter Kunszt (CSCS)
Andreas Gellrich (DESY-HH)
Günter Grein, Sven Hermann <chair> (FZK)
Kilian Schwarz (GSI)
Christian Peter (ITWM)
Andreas Nowack (RWTH Aachen)
Horst Schwichtenberg (SCAI)
Hans-Gunther Borrmann (Uni Freiburg)
Missing* (excused):
(DESY-ZN)
(MPPMU)
(Uni Dortmund)
(Uni Karlsruhe)
(Uni Wuppertal)
(* sorted by site name)
1. Introduction
- announcement for this meeting found sufficient this time.
- Comments on last meeting's minutes
- Announcements
o Communication channels (add, clarify) (in progress,
obstacle:
holiday period - #! Sven)
o Top 5 MW changes: sent (done)
o dCache solution checked and ok (done)
o Quattor Group lead by Michel Louvin: Please visit
http://trac.lal.in2p3.fr/LCGQWG and get in contact (#! DESY, RWTH,
Freiburg and who else might
be interested)
o tickets opened for missing OPS VO support (still four DECH
sites
affected, !# solve soon please: less than two weeks time left)
o Reminder: GridKa School from September 11-15, 2006. Please
find
further information at http://www.fzk.de/gks06 .
2. Status and Plans for "DECH" VO
- Rollout plan in our region:
o make DECH VO available for "/bin/hostname" at every DECH site
till
mid September (with at least one job slot) (!# ALL, done by most)
o setup regional SFT server soon (!# GridKa) and start testing
DECH VO
o agree about critical tests at DECH sites concerning DECH VO
later
- Formal introduction of DECH VO by SCAI (AUP, Webpage, Entry in
CIC-Portal, ..) (!# Horst)
- DECH VO should be used in future also to provide (preliminary) first
resources for newly registered VOs in EGEE.
Status:
lcg-infosites --vo dech ce
valor del bdii: exp-bdii.cern.ch:2170
#CPU Free Total
Jobs Running Waiting ComputingElement
----------------------------------------------------------
33
2
0
0 0
ce01-lcg.projects.cscs.ch:2119/jobmanager-lcgpbs-dech
162
12
0
0 0
grid-ce0.desy.de:2119/jobmanager-lcgpbs-dech
152
9
0
0 0
grid-ce1.desy.de:2119/jobmanager-lcgpbs-dech
86
13
0
0 0
grid-ce2.desy.de:2119/jobmanager-lcgpbs-dech
100
1
0
0 0
lcg-ce0.ifh.de:2119/jobmanager-lcgpbs-dech
1716 227
0
0 4
ce-fzk.gridka.de:2119/jobmanager-pbspro-dech
1716 224
0
0 4
a01-004-128.gridka.de:2119/jobmanager-pbspro-dech
0
0
0
0 0
lcg-ce.gsi.de:2119/jobmanager-lcglsf-dech
94
0
0
0 0
grid-ce.physik.rwth-aachen.de:2119/jobmanager-lcgpbs-dech
20
20
0
0 0
scaicl0.scai.fraunhofer.de:2119/jobmanager-lcgpbs-dech
6
6
0
0 0
grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-large
6
6
0
0 0
grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-short
6
6
0
0 0
grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-medium
3. Round the Sites (encountered problems, issues to discuss, ops-VO,
gLite update status, ..)
*CSCS*
- no problems with operation of site
- COD DECH shift went fine
- #CPUs upgrade till end of year
*DESY-HH*
- adjustment of new colleague (Christoph Wissing) at DESY, additional
documentation created
- moved LFC catalogue to gLite (mySQL) -> Comment: straight forward.
- ongoing CMS production
- comment: general holiday period in DECH (excused for next meeting)
- COD DECH member at DESY: Christoph Wissing
*DESY-ZN*
(excused, nothing to report)
*FZK*
- complete power cut after cooling system failure
- FTS server 1.4 switched off
- FTS cahnnles to 8 Russian (RDIG) Alice Tier2 Sites set up and tested
- PPS pending
- opn to CERN production postponed (update 050806: up and running)
- some configuration problems fixed -> improvement in SFT results
- Site-BDII/GIIS moved to extra virtual machine, but still problems
with information system (investigating)
A.G.: DESY-HH has dedicated machines with 2 CPUs for each,
Site-GIIS and Top-Level-BDII.
*GSI*
- gLite 3.x in test environment (migration experience as usual)
- hard disk crash on CE
- LFC outage, weired error messages about status of DB, reboot solved
problem
- ALICE data challenge ongoing
- SC4 Tier2 meeting 18./19.9. in Munich
http://www.etp.physik.uni-muenchen.de/sc4workshop/registration.php
- problem with wrong information on EGEE webpages about gLite MW
(reported successfully in Operations Meeting 31.7.2006)
- suggestion: decouple duties COD, ROD, TPM
Discussion: outcome: first go on setting up duties, maybe chance policy
on long term, to be discussed later
*ITWM*
- queue priority problem with OPS VO, to be solved soon.
- R-GMA (data outage)
- general holiday period
- good experiences with COD DECH, but central outage of CIC portal or
SFTs sometimes
*MPPMU*
- upgraded to gLite 3.0 successfully
- waiting for new WN tarball soon
- issue to solve: APEL connection to SGE server logfiles from CE
- OPS VO support: to be ready soon
- SFTs: ok now.
*RWTH Aachen*
- dCache up and running (also with SFTs)
- Quattor templates available and usable now
*SCAI*
- positive feedback about COD shift, though much work
- gLite in Production and PPS
- WMS (gLite-flavoured), test differences gLite-flavour/lcg-flavour
- set up DECH VO (AUP, webpage etc.)
- NA4 docking
- production Biomed
* hardware prob: hard disks
* SFT problem, but prod was constantly up and running
- DECH VO Poster for EGEE conference planned (!# ALL: send suggestions
to Horst)
- standalone MW installation (discussing with Cal Loomis)
- docu API gLite
*Uni Dortmund*
- routine production for LHCb and H1
- no specific problems (in spite of heat)
- Site Dortmund correctly visible in Google map again
- ToDo: Upgrade CAs, Support DECH VO
*Uni Freiburg*
- gLite 3.0: many problems with update
- dCache: different DB structure advised, SE renew installation, TB
data loss, backup dCahe but different problems then.
- CE renew installation (R-GMA, PBS config)
*Uni Karlsruhe*
(excused)
- in scheduled downtime
- reconfiguration due to e.g. security issues
*Uni Wuppertal*
(excused)
- conflict with different meeting at the same time as this one here
****
4. ROC-on-duty (EGEE funded effort)
Status: Workflow is documented now. To be published soon.
Handover FZK - SCAI: majority of tickets about "Accounting" and
"OPS VO", business as usual
5. Setup of COD-DECH (EGEE funded effort)
Status: See comments above, successful third unofficial COD-shift for
ROC-DECH. (Update: positive feedback received by CE ROC)
6. AOB
none
--
Dr. Sven Hermann sven.hermann@iwr.fzk.de
Forschungszentrum Karlsruhe Tel.: +49-7247-828632
Institute for Scientific Computing / Inst. f. Wissenschaftliches Rechnen
Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
**************************************************************************
The Institute for Scientific Computing of Forschungszentrum Karlsruhe will again run its annual GridKa School from September 11-15, 2006. Please find further information at http://www.fzk.de/gks06 .
**************************************************************************