Ops team meeting minutes 2012-5-1

http://indico.cern.ch/conferenceDisplay.py?confId=187806

Attending: Stuart Wakefield, Gareth Roy, Brian Davies, Ian Collier, Govind Songara, Elena Korolkova, Mark Slater, Stuart Purdie, Andrew McNab, Andrew Washbrook, John Bland, 
Catalin Condurache, Gareth Smith, Daniela Bauer, Mingchao Ma, Alessandra Forti, Wahid Bhimji, Sam Skipsey, Rob Harper, Ewan Mac Mahon, Santanu Das, Christopher Walker, 
Rob Fay, Mohammad kashif, Emyr James, Matthew Doidge, Pete Gronbech, Chris Brew, Duncan Rand, Jeremy Coles.

Apologies: Raul, Mark

Experiment problems/issues

CMS - Brunel network issues - OK now. Imperial had CE problem - aborting user jobs. Brunel SE had become full. CMS removed unneeded data. Stuart thought Brunel had enough space.

LHCb - no report
	
ATLAS - Not much to report since last week. Two tickets - one at RALPP one at UCL. UCL downtime on-going. Brian: Issue at RALPPD due to change of FTS copy command to SRM-copy. 
Waiting for Thursday ATLAS SE functional tests to see what happens. Was the site informed - yes. 
Proddisk filled - data now redirected into datadisk. Monthly availability report (see agenda page). Seven sites below 90% - some sites still need to send Alessandra their explanations. 
Also uploaded history of ABCD classification over the last year or so. Poor reliability over recent months - many network upgrades. ATLAS do do some adjustment for downtime
- so sites should argue their case in their report.

Other VO issues - Chris. Ticket opened by Pheno was thought to be fixed. T2K now seeing the problem but RAL can reproduce it so hopefully things will be fixed. 
Imperial had a similar problem - solved by using CERN myproxy server. Site issues: T2K has requested to run at Manchester. Instructions for space-token requested. 
dzero can now use Cream CE. Discussion of problem for small VO's whose jobs don't run before their proxy expires. 

Meetings & updates

Jeremy introduced his Ops bulletin: https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

First two columns are related to issues which might be updated weekly basis. Green headings are core tasks. Bottom half of page contains meeting summaries. e.g. GDB, Ops, PMB etc. 
Questions regarding the necessity to update it in good time before a meeting. Trial it in May. Going through it today:

Tier-1 report: no comments on Tier-1 report. 
Accounting: No change in accounting except Steve's page http://pprc.qmul.ac.uk/~lloyd/gridpp/hs06.html has been updated with HS06 GR columns. 
Documentation: grid user crash course. 
Interoperability: EMI WN tarball now available - comments please, Daniela has commented already. 
Rollout: Not every site running latest SL5 OS. 
Services: March and April network utilisation figures requested. 
Tickets: Matt went thought the tickets. Ongoing problems at Durham - needs further investigation. Oxford disk server loss: What will smaller VO's do when they lose data? 
The latest patch for DPM (1.8.3) was still problematic at Glasgow - problem with globus threading affecting DPM - now understood. 

Hardware has arrived at Lancs for nagios backup. 

Review of April GDB

GDB - Michel indicates he'd like to use pre-GDB more. 
EMI-2 release meant to be 7th May. UMD release follows that. 

reminder: 

- HEPSYSMAN meeting
- Meeting for core ops people tomorrow (Wed)

AOB

Ewan: do we know how to certify a new site (ref Sussex)? See the page pointed to by Steve.

[11:03:50] Catalin Condurache joined
[11:03:52] Emyr James joined
[11:03:54] Mark Slater joined
[11:03:56] Pete Gronbech joined
[11:03:58] Jeremy Coles joined
[11:03:59] Matthew Doidge joined
[11:04:12] Gareth Smith joined
[11:04:14] Rob Harper joined
[11:04:27] Jeremy Coles Duncan is taking minutes
[11:04:38] Andrew McNab joined
[11:04:59] Brian Davies joined
[11:05:04] Chris Brew joined
[11:09:12] Santanu Das joined
[11:09:26] Santanu Das Sory
[11:09:33] Santanu Das I'm late
[11:15:18] Chris Brew Just sent you the RALPP reason
[11:16:28] Gareth Roy joined
[11:16:45] Sam Skipsey joined
[11:17:16] Jeremy Coles ATLAS report: https://indico.cern.ch/materialDisplay.py?contribId=0&materialId=0&confId=187806
[11:23:04] Ian Collier joined
[11:25:30] Daniela Bauer But why not start with 24 h proxies to start with ? That gives the system some breathing room.
[11:26:50] Jeremy Coles https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest
[11:30:16] Ewan Mac Mahon This does seem to overlap a bit in scope with the minutes of this meeting.
[11:30:48] Rob Harper First impression is pretty good: one stop shop is nice.
[11:30:58] Matthew Doidge That's always a problem with tickets, people keep fixing things!
[11:31:19] Ewan Mac Mahon The blackguards!
[11:32:09] Matthew Doidge Is there scope for archiving the bulletins?
[11:32:47] Ewan Mac Mahon Well, there's the wiki history, at least.
[11:32:55] Jeremy Coles yes.
[11:32:58] Santanu Das What's the command to get the Storage used per VO?
[11:34:01] Duncan Rand http://pprc.qmul.ac.uk/~lloyd/gridpp/hs06.html
[11:34:57] Sam Skipsey Santanu: the admin toolkit's dpm-sql-usage-by-vouser will do that for you.
[11:36:00] Santanu Das looks like I haven't got installed admin toolkit - what's the name of the package?
[11:40:35] John Bland why upgrade when it works?
[11:41:49] Sam Skipsey Santanu: dpm-contrib-admintools (it's in the dpm devel repository, for reasons best known to Ricardo)
[11:42:24] Daniela Bauer This is the raw data (not all sites though, because it takes a while to run)
[11:42:26] Daniela Bauer http://www.hep.ph.ic.ac.uk/~dbauer/grid/public_logs/emi_deployment/120427/wn.txt
[11:43:40] Matthew Doidge Lancaster's got some crusty glite that needs sorting out
[11:44:11] Daniela Bauer I know  
[11:44:38] Daniela Bauer You can see what everone else is running and pick your favourite.
[11:45:30] Santanu Das Sam: I don't see any devel repo for UMD, only base and update
[11:48:10] Sam Skipsey there probably isn't a devel repo in UMD, it's kinda against the concept of UMD itself...
[11:48:52] Sam Skipsey The DPM page for it is: https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Components
[11:54:45] Christopher Walker Sorry.
[11:55:42] Wahid Bhimji dpm-contrib-admintools is in the main release now since 1.8.3 I thought
[11:56:40] Christopher Walker re Bristol: One of the reasons for the last upgrade I did was to demonstrate it worked. 
[11:56:52] Sam Skipsey 1.8.3 isn't in UMD, though, Wahid
[11:59:49] Matthew Doidge Thanks Chris, I'll let Winnie know
[12:01:03] Christopher Walker I'm sure I've already told Winnie...
[12:01:28] Matthew Doidge Working on the nagios backup, as we've moved to a shared machine room now there was paperwork I was unaware of that has to be filled in "approved" by a committee
[12:01:30] Christopher Walker But doesn't help to tell her again - we've been up with very few problems since.
[12:03:42] Christopher Walker Sorry - I mean no harm telling her again. 
[12:05:22] Ewan Mac Mahon The EGI tests are still bloody useless however, regardless of what they are 'seeking'.
[12:09:54] Alessandra Forti you add them to gocdb and bdii it should be automatic
[12:10:09] Stuart Purdie https://wiki.egi.eu/wiki/PROC09#Resource_Centre_Registration_and_Certification_Procedure
[12:10:27] Alessandra Forti perhaps you need to add them to nagios too these days
[12:11:41] Stuart Wakefield left
[12:11:46] Gareth Roy left
[12:11:47] Brian Davies left
[12:11:47] Ian Collier left
[12:11:48] Govind Songara left
[12:11:48] Elena Korolkova left
[12:11:49] Mark Slater left
[12:11:49] Stuart Purdie left
[12:11:49] Andrew McNab left
[12:11:50] Andrew Washbrook left
[12:11:51] John Bland left
[12:11:52] Catalin Condurache left
[12:11:53] Gareth Smith left
[12:11:53] Daniela Bauer left
[12:11:54] Mingchao Ma left
[12:11:56] Alessandra Forti left
[12:11:58] Wahid Bhimji left
[12:12:16] Sam Skipsey left
[12:12:48] Rob Harper left
[12:15:18] Ewan Mac Mahon And copy me in.....
[12:16:08] Ewan Mac Mahon https://gocdb4.esc.rl.ac.uk/portal/index.php?Page_Type=View_Object&object_id=25731&grid_id=0
[12:16:10] Santanu Das left
[12:16:15] Ewan Mac Mahon ^ SUSX's gocdb entry
[12:16:21] Christopher Walker Emyr - hang around for a sec
[12:16:33] Rob Fay left
[12:16:36] Mohammad kashif left
[12:18:15] Ewan Mac Mahon left
[12:25:32] Matthew Doidge left
[12:26:41] Chris Brew left