<meta content="text/html; charset=utf-8" http-equiv="CONTENT-TYPE" /> <meta content="OpenOffice.org 2.0 (Linux)" name="GENERATOR" /> <meta content="20070515;17161800" name="CREATED" /> <meta content="16010101;0" name="CHANGED" /> <style type="text/css"> </style>
Minutes for Regional Operations Meeting DECH (May 4th, 2007)
Attendance:
Sven Hermann <chair>, Clemens Koerdt, Andreas Heiss (FZK)
Alessandro Usai (CSCS)
Andreas Haupt, Kai Leffhalm (DESY-ZN)
John Kennedy (LRZ)
Renate Dohmen (MPPMU)
Kläre Cassirer, Horst Schwichtenberg (SCAI)
Apologies:
(Uni Wuppertal)
(RWTH-Aachen)
(Uni Freiburg)
(ITWM)
Missing:
(Uni Dortmund)
(GSI)
(DESY-HH)
(Uni Karlsruhe)
(Uni Siegen)
1. Introduction
Last meetings minutes: no comments
Announcements:
Volunteer PPS sites wanted for SRM 2.2. (currently DESY and FZK)
CIC portal has email notification system for SAM alarms, feel free to use it!
Few production updates
22: VOMS service, YAIM, LB bug fixes
23: edg-mkgridmap, DGAS logging, lcg-info command updated with new functionalities, new VOMS certifications updates
PPS update 28
Conference coming week: OGF/User Forum in Manchester
New site in region: UNI-SIEGEN-HEP now fully certified
Keep timeslot of this conference (only a few sites responded)
Action Items:
PPS certification (get info system and SAM tests working: #! GSI, ITWM)
-> ITWM - no update, GSI now passing SAM tests. OPEN.
Dech VO: Now only recently certified sites LRZ and Uni Siegen not supporting Dech VO.
-> LRZ needs a couple of weeks, Uni Siegen is yet to be informed (#C.K.)
Create a criteria catalog to classify MW update in "major" and "minor" ones to warn site admins regionally in advance, to better advise smaller sites
-> see DECH Wiki entry (link on agenda page). Please contribute! To be agreed next meeting. (CERN now also specifying service updates as well). OPEN.
AU: New updates that require downtimes should be grouped in order to reduce overall downtimes
SH: Release team is already respecting this. This criteria catalog is for our region only to make sites more aware of the issues.
Coordination meeting T1-T2 concerning storage/transfers to be organized? ->
Andreas Heiss: Concerning discussions on use cases, sharing information, we are already in contact with Feichtinger. Is there a need for more discussions?
Alessandro: Well perhaps no need for extra meeting, but currently experiencing transfer problems CSCS-FZK. When it comes to planning would be helpful to know how experiments want to distribute the amount of data to the different T2s
Andreas Heiss: At FZK we have the same problem, the way data is distributed is planned by experiments. One could have a look at their computing models, but they tend to change them...
Alessandro: Particular questions are: How many pools do we need? How many hot files should we expect (that need to be close to the WN)?
Andreas Heiss: There is actually a working group at CERN to work out concepts for T1 storage management. Might be helpful also for T2s. Will contact them (#A.He.)
(We might also expect experiments to wanting access to raw data once the physics has started.)
John Kennedy: What concerns ATLAS get in contact with me. We can talk about this
2. Round the Sites
CSCS
Increased again our number of CPUs
Currently employing two thumpers for our dCache installation
Production running fine
All our WNs are on SL4,
Atlas reports problem of not being able to install more software -> still under investigating
CMS also experiencing problems with tools to update experiment software
There was a problem with the NorduGrid branch –> migration to more powerful machine
DCache bit of problem: transfers from Karlsruhe often failing, have to understand why
Next week will update to the most recent dCache patches
Might see improvements then...
Discussing need of more hardware for storage.
AH: Star FTS channels changed to copy mode. (No feedback from Derek Feichtinger yet)
AU: Failure rate still too high
---
DESY
(No information received)
---
DESY-ZN
Production running smoothly
Some problems when transferring files from Karlsruhe
E.g.: error file exists
A.He: should be fixed with the next transfer version
A.Ha: the files are actually transferred correctly but then FTS does not recognize them!?!
Is our older 1.6.6 dCache version the reason?
Plan to upgrade to 1.7. (Looking for support from experienced admins...)
---
FZK
DCache new patch 35 on gridftp doors. for headnode downtime necessary
Quite stable last week (still experiencing about one timeout a day, however...)
The number of failing connections to sam-bdii.cern.ch went down a lot
CE-fixed inconsistency in queues reported after IS update
dCache 1.8beta (SRM 2.2) in PPS
Updated APEL RPMs, waiting for some statistics
---
GSI
(No information received)
---
ITWM
(No information received)
---
LRZ
Managed to loose some files on dCache pools
Looking into this
---
MPPMU
Problem with SGE support still open
LRZ site does not have the same problems despite their similarities
Will work together on that
---
RWTH
(No information received)
---
SCAI
Businesses as usual
CE problem : a staling old job blocked new jobs for few days (now deleted)
Running again fine
---
Uni Dortmund
(No information received)
---
Uni Freiburg
(No information received)
---
Uni Karlsruhe
(No information received)
---
Uni Siegen
(No information received)
---
Uni Wuppertal
(No information received)
---
3. Feedback to the TCG
Horst: sent questionnaire to partners involved (S.H. and K.S.) awaiting their feedback
4. COD
Next COD shift will be during the week of 7.-14.5.
5. ROC-On-Duty
Handover FZK to SCAI
FZK: no specific points for feedback
No questions from SCAI
SH: Statistics: 46 tickets were created and 53 tickets solved during the last two weeks.
6. AOB
HS: Our regional middleware support group: is there a policy on who is actually responsible for a ticket?
SH: Anybody seeing solution (or seeing himself as an expert) should assign ticket to himself. If nobody can be found forward the ticket to an appropriate expert support unit.
There was an agreement on that some time ago (try to find the mail and forward it to Horst)
AU: Any news when the lcg CE will no longer be supported, because of its replacement by the gLite CE?
SH: The update is that ceasing support for that in June will probably not be possible. In an Operations Meeting in March (http://indico.cern.ch/conferenceDisplay.py?confId=13619) Ian Bird has presented a criteria catalog for the gCE and also the WMS, which specifies the requirements needed for those services to replace the old LCG counterparts.
Next meeting will be on the 1st of June: