ScotGrid Technical Meeting 2006-09-21 ------------------------------------- Present: Graeme, David (Glasgow); Steve, Greig (Edinburgh); Mark, Phil (Durham) Agenda: 1. VO Configuration Updates + VOMS enabling Pheno and BaBar Graeme + New VOMS roles for LHCb VOMS for pheno. URL is here: http://www.phenogrid.dur.ac.uk/howto/config LHCb as per LCG_ROLLOUT postings. Need to rerun config_mkgridmap on CE. And config_vomses on UI if you support these VOs logging in. Need to copy in voms.gridpp.ac.uk public key to /etc/grid-security/vomsdir. Additionally there's a minor gLite upgrade. ACTION: Graeme to summarise steps taken on scotgrid-gla for other sites. Discussion on how awkward it is to enable and reconfigure VOs for sites, prompted by Steve's post to ScotGrid-TechBoard this morning. General agreement that better tools are needed, though it wasn't clear quite what - certainly something more automated. 2. Site Status + Durham Mark Nothing to report. + Edinbugh Steve dCache has been reconfigured successfully. See Greig's blog posting: http://scotgrid.blogspot.com/2006/08/we-have-finally-killed-off-our-dpm-at.html + Glasgow Graeme Summary of issues with the new cluster: CVOS is useful, but has run into lots of problems with SL3 32 bit compatibility. Worker node image now done and deployable. Grid servers being maintained in the "old way" - kickstart to a base install, then YAIM "by hand" - will probably try to script this a little... Disk server nodes have ARECA cards not supported by vanilla SL - will use SLC43 i386 as base here. Should be ok. Again CVOS is useful - 10 nodes which are cloned... Many issues outstanding about getting ganglia, IPMI to work, but these are secondary. Will run batch system independently of YAIM. Not going to EGEE conference because of these ongoing issues :-( 3. Cross Site Support Issues + What have we found out about site policy? All Graeme has got blessing from site security people to grant root access to others. Nigel generally sympathetic, but Mark to chase this again, and get his support, before approaching site security people. Steve thought there would be no problem at Edinburgh. + What are the feelings of site admins? General concern that more harm than good might be caused. This facility should be used to provide cover when no one from the site is available, not to provide out of hours cover. Noted that T2 support levels are not 24x7! + Technical Implemtation Agreement that implementation should consist of: 1. ssh access from a limited number of hosts - preferably hosts that only the sys admins have access to. 2. Login as normal users - ssh key acceptable. 3. sudo to a root shell. Password can be the same on each foreign site? 4. New Site Functional Tests + Issues in moving to OPS VO Sites look ok now - Ed still failing RM tests. See http://scotgrid.blogspot.com/2006/09/edinburgh-have-been-failing-replica.html and http://scotgrid.blogspot.com/2006/09/some-notes-on-switching-to-ops-vo-for.html for issues arising. Greig will post the RM test problem to storage group. Other dCache sites in the UK are passing this test. General isse of how sites debug problems when they are not in the affected VO - Edinburgh's 3rd party rep is a case in point. Should one person in dTeam be allowed to join OPS to help sites debug problems? ACTION: Graeme to raise this in dTeam. + LFCs broken at Glasgow and Durham Glasgow have stopped publishing their local LFC, but SAM tests are still trying! Although Glasgow removed their LFC from the BDII the SAM tests are still running against it and still failing. As no one is using the local LFC right now this will be marked as "non-relevant". Graeme advised concentrating on the CE and SE critical tests on the SAM portal. Some discussion on ephemeral RGMA/APEL test failures. Graeme reported a strong correlation between failures at each of the sites, leading him to believe that these were not site specific problems and should be marked as non-relevant. Important to do so because one day a bean counter will add up the numbers. 5. Documentation + ScotGrid Wiki vs. GridPP version Is the ScotGrid wiki redundant? Yes. Use the GridPP wiki instead. ACTION: Graeme to do a "ScotGrid Dashboard" page on the GridPP wiki, replicating the (only?) useful page on the old scotgrid wiki. + Blog It's good. Use it. 6. Open Actions + ScotGrid Distributed Storage Graeme/Greig ACTION: Graeme to kick Graeme's arse on this. 7. AOB + 1/4 Reports - storage figures Graeme will distribute a better tool next week. + Edinburgh gridmon box Steve to contact Mark Leese to see if he can get any life out the box. Otherwise a site visit will need to be arranged. + BIomed data challenge - need inbound access to WNs for flex (bleugh...) Graeme noted they needed inbound IP access to WNs, so we can't take part. + Changes in reporting period - fill in reports mon/tue. Ops meeting has moved to Thursdays. Sites will fill in report for previous week (Monday->Sunday) on Monday/Tuesday. ROCs can aggregate issues on Wednesdays. Edinburgh not having trouble filling in site report, but no harm in Graeme also being listed as a site-admin to let him do it, and also browse the report. ACTION: Steve to add Graeme as Edinburgh site admin. + Transfer Tests Durham had been asked to move their transfer test forward. This is not possible given the number of people Mark has to let know when a test is occuring, but he should be OK for early next week.