Tuesday 14 June 2011

### 11:00  Meetings & updates (20')         

# ROD team update
Nothing reported.

# Nagios status
Nothing reported.

# Tier-1 update

Nothing significant.  
CREAMCEs being reinstalled and updated to latest gLite version.
A few problems with some disk servers.

# EGI OLA

Corrected link https://documents.egi.eu/public/ShowDocument?docid=31
Lighter availability & reliability requirements than WLCG, but possibility of site suspension if missing

targets for 3 consecutive months.
There is a summary table of metrics on page 12.

# Security update

SSC5 questionnaire due out this week.
Security workshop planned for HEP Sys Man meeting.

# T2 issues

-- Availability for May: http://tinyurl.com/6fqwwc3. Are CREAM issues causing any of the problems for UCL-HEP

(41%:28%); EFDA-JET (73%:49%) and Birmingham (87%:87%).

EFDA-JET was having trouble with installing a new BDII and CREAMCE.
Birmingham was hit by a disk failure bringing the site down for ~ a week.
No direct link seen between CREAMCE and downtimes.

#  GDB last Wednesday

A few points from the meeting. Chris W’s summary is at http://www.gridpp.ac.uk/wiki/GDB_8th_June_2011
- Call for sites to decommission lcg-CE
- Call for registration and talks for WLCG workshop at DESY.
- Report on Glexec.  A dedicated mailing list exists to support tier 2s.
- EMI UMD 1.0 release scheduled for 4th July. UMD 1.1 5th September.
- Much discussion about whether or not glexec is the right way forward.

# GGUS tickets: http://tinyurl.com/3ulldrr


### 11:20  Experiment problems/issues (20')         

# LHCb
Nothing major.
LHCb not currently running many jobs on tier 2s.

# CMS
Generally OK, but problem with low-efficiency jobs on tier 1 (but that’s a CMS problem).

# ATLAS
Quite quiet last week.  See separate Atlas report.
Glasgow accepts work from other clouds. Should other sites do the same? It seems that being a T2D is a step

but it is not clear how to proceed further.  Action AF to find out.

# Other
Storage group is polling sites about space usage by smaller VOs to improve data consistency, etc.

# Site performance/accounting issues
Durham is behind in publishing.

# Metrics review
Still ongoing.


### 11:40 Is my site getting enough work? (10')

Over last 2 months many sites doing a higher percentage of work than their hepspec06 percentage, but generally

not far off.  This is balanced by some apparently underperforming sites, particularly Lancs and ECDF, but both

due to being a shared clusters that currently publish the full capacity of the cluster while typically less is

available for the grid.  Some lively discussion about the best way to publish sites in cases like this.  There

seems to almost be a consensus for crediting work done rather than nominal capacities, but this could run

on...

- Understanding the current status using links here:
 http://www.gridpp.ac.uk/wiki/Links_Monitoring_pages


###11:50 General discussion (05')         

# New thoughts on the network topic

Janet rates have to run at 70% (average) of line rate for 3 months before they will get upgraded.

Planning to get network usage monitoring in place in preparation for a measurement period so we can get some

figures on networking from sites.

# Progress/observations with glexec

Nothing new.

# Specific problems encountered in last week

Nothing reported.


### 11:55  Actions (05')         

http://www.gridpp.ac.uk/wiki/Deployment_Team_Action_items (will follow up offline)


### 12:00  AOB (01')         

Nothing.



### Chat:
[10:59:27] Mark Mitchell joined
[10:59:33] Brian Davies joined
[10:59:46] Raja Nandakumar joined
[10:59:58] Wahid Bhimji joined
[11:00:15] Elena Korolkova joined
[11:00:24] Mark Mitchell Back in 5
[11:01:10] Santanu Das joined
[11:01:31] Jeremy Coles Rob will take minutes today.
[11:01:36] Stephen Jones joined
[11:01:50] Stuart Purdie joined
[11:02:08] RECORDING Rob joined
[11:02:41] David Crooks joined
[11:02:57] Mark Slater joined
[11:03:43] Mingchao Ma joined
[11:04:39] Mohammad kashif joined
[11:05:58] Catalin Condurache cannot access it
[11:06:06] Catalin Condurache what login should we use?
[11:06:24] Mingchao Ma I can't see it either
[11:06:25] Rob Harper I was able to get it this morning
[11:06:31] Pete Gronbech joined
[11:07:20] Chris Brew joined
[11:07:38] Jeremy Coles The link now: https://documents.egi.eu/public/ShowDocument?docid=31
[11:12:26] Alessandra Forti joined
[11:12:52] Stephen Jones left
[11:13:08] Stephen Jones joined
[11:15:34] Stuart Wakefield joined
[11:15:49] Jeremy Coles http://pprc.qmul.ac.uk/~lloyd/gridpp/samplots.html
[11:26:40] Govind Songara joined
[11:28:39] Wahid Bhimji left
[11:30:41] Wahid Bhimji joined
[11:32:07] Queen Mary, U London London, U.K. joined
[11:37:24] Brian Davies https://twiki.cern.ch/twiki/bin/view/EGEE/FtsRelease22 , srm/guc split now applied to

UKT2-RAL channels. should increase transfer rates from ukt2s to ral withou increasing the number of concurrent

transfers
[11:39:47] Elena Korolkova I have a fire alarm
[11:40:01] Elena Korolkova so leaving the meeting
[11:40:05] Jeremy Coles ok - hope it is a test!
[11:45:03] Ewan Mac Mahon joined
[11:47:51] Ewan Mac Mahon Sorry; just got here - what URL are we looking at?
[11:48:01] Alessandra Forti indeed
[11:49:02] Sam Skipsey For ECDF, it *is* because it's a shared cluster - the published amount is the total

resource.
[11:50:38] Jeremy Coles The GIF attached to the agenda Ewan.
[11:59:45] Elena Korolkova I',m back
[12:06:17] John Bland don't forget ecdf+lancs are skewing the % figures
[12:06:32] Matthew Doidge true
[12:08:12] Queen Mary, U London London, U.K. QMUL's figure for CPU includes CPU that came on a couple of weeks

ago.
[12:08:17] Wahid Bhimji I think you should just credit work done - but the same goes for not including

existing disk - if it doesn't get work done then whats the use
[12:09:37] Raja Nandakumar Apologies - got to go ...
[12:09:42] Raja Nandakumar Bye ...
[12:09:47] Raja Nandakumar left
[12:11:06] Alessandra Forti I agree with you wahid
[12:11:16] John Bland wahid gets my vote!
[12:11:23] Ewan Mac Mahon Disk is slightly different - there is a state at the moment where most idle disk is

idle because the VOs aren't using it, whereas they seem to be able to saturate CPU unless there is a problem

with it.
[12:11:27] Alessandra Forti disk space is empty and has 100 weight
[12:12:11] Alessandra Forti yesbut the management shouldn't be surprised if sites with smaller disk capacity

do still more work.
[12:12:14] Ewan Mac Mahon But the same principle arguments apply; if a site has lots of disk but it's

uselessly crap disk, it shouldn't get credit for just sitting there.
[12:15:08] Chris Brew left
[12:18:39] Brian Davies left
[12:18:41] Mark Slater left
[12:18:41] Govind Songara left
[12:18:41] John Bland left
[12:18:41] David Crooks left
[12:18:42] Andrew McNab left
[12:18:43] Alessandra Forti left
[12:18:44] Mingchao Ma left
[12:18:45] Catalin Condurache left
[12:18:45] Sam Skipsey left
[12:18:49] Mohammad kashif left
[12:18:49] Santanu Das left
[12:18:51] Mark Mitchell left
[12:19:00] Ewan Mac Mahon left
[12:19:09] Stuart Purdie left
[12:19:46] Stuart Wakefield left
[12:19:52] Matthew Doidge left
[12:19:53] Stephen Jones left