Attendess
Graeme Stewart GS
Jeremey Coles JC
James Cullen JaC
Alessandra Forti AF
pete Gronbech PG
Mohammed Kashif KF
Raja Nandakumar RN
Duncan Rand DR
Derek Ross DeR
Sam Skipsey SS
Brian Davies BD
Experiment problems/issues (20')
Review of weekly issues by experiment/VO
- LHCb
RN-Bulk of FEST work done.
RAL had a problem (big ID).
No recon until Sunday ( though ran successfully on Sunday and Monday.)
Imperial had a problem with FEST jobs, (infinite loop which ends up being killed by batch system.) LHCb
investigating.
Coming week should be low activity ( othe re than users) except FEST activity on Wednesday at T1.
- CMS
No news or questions
- ATLAS
GS-Reinstalled SS UK box (DDM). Good functionality cf older version which had huge backlogs. cloud taken offline
sunday morning . Back on at time of meeting.
Brief phase of production.
RAL had 200TB in MCDISK 2009 space is 300TB. Users need ot clear old files so sites might end up being idle.
clearer in next couple weeks after chamonix meeting.
FTS heavy load. channel throttled. under discussion with T1.
Pre-staging broken at RAL. Bug in CASTOR SRM.
Intend to test pCache. Plan needs to be organsied.
DR-Hammer Tests at RHUL running?
GS-GS to chase up.
DR-Is Brunel in production?
GS-should start working now
DR-LOCALGRODPDISK usage in London cloud?
GS-localgroupdisk can be used by all users using DDM of Datasets
GS-will send a summary of postings around dteam list so as to beable to handle enquiries.
- Other
- Site performance
-- http://pprc.qmul.ac.uk/~lloyd/gridpp/ukgrid.html
-- Relative stability - http://gridmap.cern.ch/gm/
ROC update (25')
***************
- at meeting last week agreed that Oxford will attempt to setup an instance of Nagios for UK wide testing
MK-Host certificate asked for; working on setting this up.
Broadcast from steve traylen regarding changes to NAGIOS
- Pilot of SCAS in preparation The gLite release team informed us that they reckon the new SCAS service (Site
Central Authorization Service) to be in a sufficiently stable condition for a pilot service to be set up. In
particular the most severe issues found earlier (memory leaks, bad configuration) were solved. The software is
currently undergoing stress testing in certification. In parallel we contacted the LHC experiments (specifically
CMS, Atlas and LHCb) in order to address the activity and they were in favour of a controlled deployment in
production of a pilot service based on some instances of SCAS. Specifically LHCb would like a supporting T1 to
be involved in the pilot, and suggest IN2P3 and/or FZK as first choices.
JC-Experiment contacts not known. who are they
RN-LHCb contact isRoberto Santinelli
- More sites complain of too much scratch space being used by jobs on WNs (Germany).
JC-VO to Check ID cards for space.
GS-UKT2s should contact ATLAS via atlas uk support list
- It is proposed that all remaining gLite 3.0 clients and services will be obsoleted at the end of April 2009.
This proposal will go to the TMB for approval.
DR-Santanu Das not happy with condor support on CE on 3.1
DeR-T1 has 3.0 CE for small V0s . Plan to move to 3.1.
Announcement: SAM: The intervention scheduled for next Monday on the SAM and GridView databases has been moved
to next Wednesday, 4th of February. During this downtime the SAM and GridView services will be down, including
submissions, web services and interfaces. This downtime is required to improve the database schemas of these two
services, moving common objects to a separate account, thus easing any future modifications.
JC-start with deputy T2C who will shadpow T1 for next couple of COD sessions sbefore DT2C take over from TIer1.
WLCG update
*****************
- Change management responses seem to have eased. Will write a summary
No single solution therefore has to be flexible.
- New question about handling of inefficient jobs (via Raja)
JC-In mail
GS-some jobs odd in torque which is then doisplayed in Monami.
- MB is concerned about move to new benchmark and how to publish two values (one old and one new).
Ticket status
***************
https://gus.fzk.de/download/escalationreports/roc/html/20090202_EscalationReport_ROCs.html
40954-Manchester-AF-hardware arrived yesterday. 96TB. In progess. AF to update ticket or blog entry.
45327-RHUL-DR-old cluster with out of dat esoftware, not enough resoiurces to keep up to date
45397-OXFORD-waiting on response
45424-onhold-my proxy
11:45
Quarterly reports (20')
[See them here: https://www.gridpp.ac.uk/deployment/status/reports/reports.html]
- Review of the draft reports (main points from each T2)
- Areas in need of updating
Scotgrid
GS-no real pressing issues
ECDF accounting broken form 1st week of December, now fixed.
LAck of effort at ECDF a concern, they are recruting
Steve Thorn covering at the moment.
Major upgrades at Glasgow and Durham done during quarter.
Durham storage should be green in Q1 2009.
Utiliszation higer than others (62%) next closest is 38% (London).
Engineers at Glashgow do give a bit of an additional baseline.
ECDF can over-provide utilisiation.
SouthGrid
PG-running quite well
New equipment into oxford.
Main problem is exploiting clusters at bham and bristol. Getting there.
Lots of disk (140TB) most of whcih is empty ( abou 10TB used.)
Bristol CPU, get more on HPC but can't have it yet
new twins top replace HEP twins.
JC-Cambridge should support OUTHRID VO
PG-Agree
lossing john wakelin an yves coppen
john leaves 13th feb yves 18th feb
LOndon-DR-
short staffed
qmul and rhul still have no full time admins
at Lesc admin laving.
imperial , people gettin gused to what to do.
SAM avalibility poor
ICHEP still has gLite 3
dCache SE probably 3.1
DR will look at it urgently.
JC-Brunel deliveirng half of the stoagre they pledgerd.
DR- more storage in machine room coming online.
JC-Some sites don't yet suppoty london VO.
LeSC CE hanging,Virtual machines, not reliable.
QMUL interviewing staff
RHUL hired, but going thorugh paperwork.
BD Lost network connectivity so unalbel to fill in the remianing discussion regarding reports.
12:05
Actions (05')
- Current status http://www.gridpp.ac.uk/wiki/Deployment_Team_Action_items
Need updating
12:10
AOB (05')
- Meet-o-matic request for February meeting still lacking responses!
need respones.
JC-to follow up on COD shifts.
From chat Window
[11:17:49] Graeme Stewart For user data replication enquiries, see
http://atlasuk.blogspot.com/2008/12/dataset-subscriptions.html
[11:18:32] Raja Nandakumar http://www.ja.net/services/video/agsc/services/evotelephonebridge.html
[11:18:42] Raja Nandakumar +44 (0)161 306 6802.
There are minutes attached to this event.
Show them.