Minutes for Regional Operations Meeting DECH (July, 28th 2006)

Attendance*:
Peter Kunszt (CSCS)
Andreas Gellrich (DESY-HH)
Günter Grein, Sven Hermann <chair> (FZK)
Kilian Schwarz (GSI)
Christian Peter (ITWM)
Andreas Nowack (RWTH Aachen)
Horst Schwichtenberg (SCAI)
Hans-Gunther Borrmann (Uni Freiburg)

Missing* (excused):
(DESY-ZN)
(MPPMU)
(Uni Dortmund)
(Uni Karlsruhe)
(Uni Wuppertal)

(* sorted by site name)

1. Introduction
- announcement for this meeting found sufficient this time.
- Comments on last meeting's minutes
- Announcements
  o Communication channels (add, clarify)  (in progress, obstacle: holiday period - #! Sven)
  o Top 5 MW changes: sent (done)
  o dCache solution checked and ok (done)
  o Quattor Group lead by Michel Louvin: Please visit http://trac.lal.in2p3.fr/LCGQWG and get in contact (#! DESY, RWTH, Freiburg and who else might be interested)
  o tickets opened for missing OPS VO support (still four DECH sites affected, !# solve soon please: less than two weeks time left)
  o Reminder: GridKa School from September 11-15, 2006. Please find further information at http://www.fzk.de/gks06 .

2. Status and Plans for "DECH" VO

- Rollout plan in our region:
  o make DECH VO available for "/bin/hostname" at every DECH site till mid September (with at least one job slot) (!# ALL, done by most)
  o setup regional SFT server soon (!# GridKa) and start testing DECH VO
  o agree about critical tests at DECH sites concerning DECH VO later
- Formal introduction of DECH VO by SCAI (AUP, Webpage, Entry in CIC-Portal, ..) (!# Horst)
- DECH VO should be used in future also to provide (preliminary) first resources for newly registered VOs in EGEE.

Status:
lcg-infosites --vo dech ce
valor del bdii: exp-bdii.cern.ch:2170
#CPU    Free    Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------
  33       2       0              0        0    ce01-lcg.projects.cscs.ch:2119/jobmanager-lcgpbs-dech
 162      12       0              0        0    grid-ce0.desy.de:2119/jobmanager-lcgpbs-dech
 152       9       0              0        0    grid-ce1.desy.de:2119/jobmanager-lcgpbs-dech
  86      13       0              0        0    grid-ce2.desy.de:2119/jobmanager-lcgpbs-dech
 100       1       0              0        0    lcg-ce0.ifh.de:2119/jobmanager-lcgpbs-dech
1716     227       0              0        4    ce-fzk.gridka.de:2119/jobmanager-pbspro-dech
1716     224       0              0        4    a01-004-128.gridka.de:2119/jobmanager-pbspro-dech
   0       0       0              0        0    lcg-ce.gsi.de:2119/jobmanager-lcglsf-dech
  94       0       0              0        0    grid-ce.physik.rwth-aachen.de:2119/jobmanager-lcgpbs-dech
  20      20       0              0        0    scaicl0.scai.fraunhofer.de:2119/jobmanager-lcgpbs-dech
   6       6       0              0        0    grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-large
   6       6       0              0        0    grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-short
   6       6       0              0        0    grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-medium

3. Round the Sites (encountered problems, issues to discuss, ops-VO, gLite update status, ..)

*CSCS*
- no problems with operation of site
- COD DECH shift went fine
- #CPUs upgrade till end of year
         
*DESY-HH*
- adjustment of new colleague (Christoph Wissing) at DESY, additional documentation created
- moved LFC catalogue to gLite (mySQL) -> Comment: straight forward.
- ongoing CMS production
- comment: general holiday period in DECH (excused for next meeting)
- COD DECH member at DESY: Christoph Wissing

*DESY-ZN*
(excused, nothing to report)


*FZK*
- complete power cut after cooling system failure
- FTS server 1.4 switched off
- FTS cahnnles to 8 Russian (RDIG) Alice Tier2 Sites set up and tested
- PPS pending
- opn to CERN production postponed (update 050806: up and running)
- some configuration problems fixed -> improvement in SFT results
- Site-BDII/GIIS moved to extra virtual machine, but still problems with information system (investigating)
  A.G.: DESY-HH has dedicated machines with 2 CPUs for each, Site-GIIS and Top-Level-BDII.
      
*GSI*
- gLite 3.x in test environment (migration experience as usual)
- hard disk crash on CE
- LFC outage, weired error messages about status of DB, reboot solved problem
- ALICE data challenge ongoing
- SC4 Tier2 meeting 18./19.9. in Munich
http://www.etp.physik.uni-muenchen.de/sc4workshop/registration.php
- problem with wrong information on EGEE webpages about gLite MW (reported successfully in Operations Meeting 31.7.2006)
- suggestion: decouple duties COD, ROD, TPM
Discussion: outcome: first go on setting up duties, maybe chance policy on long term, to be discussed later

*ITWM*
- queue priority problem with OPS VO, to be solved soon.
- R-GMA (data outage)
- general holiday period
- good experiences with COD DECH, but central outage of CIC portal or SFTs sometimes

*MPPMU*
- upgraded to gLite 3.0 successfully
- waiting for new WN tarball soon
- issue to solve: APEL connection to SGE server logfiles from CE
- OPS VO support: to be ready soon
- SFTs: ok now.

*RWTH Aachen*
- dCache up and running (also with SFTs)
- Quattor templates available and usable now

*SCAI*
- positive feedback about COD shift, though much work
- gLite in Production and PPS
- WMS (gLite-flavoured), test differences gLite-flavour/lcg-flavour
- set up DECH VO (AUP, webpage etc.)
- NA4 docking
- production Biomed
  * hardware prob: hard disks
  * SFT problem, but prod was constantly up and running
- DECH VO Poster for EGEE conference planned (!# ALL: send suggestions to Horst)
- standalone MW installation (discussing with Cal Loomis)
- docu API gLite

*Uni Dortmund*
- routine production for LHCb and H1
- no specific problems (in spite of heat)
- Site Dortmund correctly visible in Google map again
- ToDo: Upgrade CAs, Support DECH VO

*Uni Freiburg*
- gLite 3.0: many problems with update
- dCache: different DB structure advised, SE renew installation, TB data loss, backup dCahe but different problems then.
- CE renew installation (R-GMA, PBS config)

*Uni Karlsruhe*
 (excused)
- in scheduled downtime
- reconfiguration due to e.g. security issues

*Uni Wuppertal*
(excused)
- conflict with different meeting at the same time as this one here

****

4. ROC-on-duty (EGEE funded effort)
Status:  Workflow is documented now. To be published soon.
Handover FZK - SCAI:  majority of tickets about "Accounting" and "OPS VO", business as usual


5. Setup of COD-DECH (EGEE funded effort)
Status: See comments above, successful third unofficial COD-shift for ROC-DECH. (Update: positive feedback received by CE ROC)

6. AOB
none
-- 
Dr. Sven Hermann sven.hermann@iwr.fzk.de
Forschungszentrum Karlsruhe Tel.: +49-7247-828632
Institute for Scientific Computing / Inst. f. Wissenschaftliches Rechnen
Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany

**************************************************************************
The Institute for Scientific Computing of Forschungszentrum Karlsruhe will again run its annual GridKa School from September 11-15, 2006. Please find further information at http://www.fzk.de/gks06 .
**************************************************************************