UKI Monthly Operations Meeting (TB-SUPPORT)

GMT
EVO

EVO

Description
Monthly review and discussion meeting for those involved with GridPP deployment and operations. To join via EVO go to http://evo.caltech.edu. To join by phone call +41 22 76 71400. The phone bridge ID is 409429 and the code: 4880.
Present Andrew Elwell Alessandara Forti Brian Davies Chris Brew Ewan Mac Mahon Gianfranco Sciacca IPP1 Durham - David Ambrose Jeremy Coles John Bland Jon Wakelin Paul Hodgson Peter Love Phone Bridge *2 - John TCD Pete Gronbech Sam Skipsey Santanu Das Simon George Stephen Childs UCL Hep - Gianfranco / Will Hayes Yves Coppens Site issues ========= Steve Lloyd tests ------------------- - TCD - Network outage (work in progress) - UCL / RHUL disk figures (2355TB) - noone available to discuss - Simon thinks the RHUL ones are still wrong - perhaps that its the wrong information provider plugin? (but its not simply 10^3 more) - UCL-HEP has a v small amount of storage. they plan to get a small amount more. accounting ------------ QMUL not up to date, EFTA-Jet also have a gap (need to republish?) - they have just replaced their CE - related? Oxford - Work in progress... (new CEs) Bristol - Were running ATLAS / ALICE until may - reason for stopping? will look into it. - they'd have been on CE1 (not yet supported on CE2) old site too small to validate for ATLAS - GS will assist with prodn when new site ready New steve lloyd test pages ------------------------------ see the 3 linked URLs in agenda. General discussion of the UK Grid results - mostly running at RALPP, ~4% failure glasgow migrating from RB to SL4 WMS (SL3 wms sick) FCR page - CMS blacklist sites fairly agressively - RHUL failing on SRMv2 tests perhaps - Space Tokens? (no CMSDEFAULT) UI Provision ------------- Nearly all sites have an installation (either dedicated or tarball on desktops) Point to user docs URL Experiments ========= CCRC roundup - see URLS ATLAS - if you see nasty atlas processes let production people know rather than killing job to debug properly. Raise GGUS tickets. -- RAL ones had somehow killed ps from completing -- Liverpool ones were over walltimes - single zombie job that had hung. Killing pbsmom cleared them off OK Expect the unexpected with user jobs - Block DNs in exteremis. working round possible file access (LAN @ birmingham) issues. There's now a UK specific savannah portal HEPSYSMAN ======== http://hepwww.rl.ac.uk/sysman/june2008/agenda.html [ BIG CHUNK MISSING AS $VENDOR_ENGINEER ONSITE ] Plain grid proxies oyt, voms proxies in pre prodn / middleware testing - Will affect the UK as barry has installed a cream CE at IC. Storage status in the UK - See links on agenda page WLCG matters ========== Tier-2 reps report onto wiki. - Security Policy - Benchmarking - Pilot Jobs - under review still (glexec - see hepsysman) LHCb approved soon? dirac + glexec testing in progress. will be at Tier1 centres. Poss risk as could mean banning entire LHCb if problems. CMS queuing jobs stop LHCb at RAL (middleware problem + RAL decision) as they don't have a Q per VO. In progress. - Daily wlcg ops meeting (14:00 ~ 14:10 UK time) - join if any issues. Discussion ======= Storage -------- DPM: Space token management in 1.6.7 is an issue- Thats one big plus for .10 (bugfix from .7 altering the retention time to short-finite from unlimited) - qmul se02 is still 1.6.7-1 dCache: Liverpool leaving alone as its working Other Glite Packages ------------------------ Review at next meeting AOB === - Please deploy the gridpp VO - Please join the gridpp-users list for information dissemination (low volume)
There are minutes attached to this event. Show them.