Present ======= Alessandra, Andrew, Jeremy, Mingchao, Pete G, Graeme, Frederic, David C, Duncan, Greig, Derek, Steve L Experiments issues ================== * Lhcb is doing a migration to castor. Raja is away. * CMS not many news but coming to RAL next week to talk to the Tier1. Castor testing is the biggest concern at T1, at T2s MC production is on going. Still problems with slow software installs. Dave Newbold will move back to do some physics and David C will take over the operations role. Chris Brew will be more involved in the castor testing as T1 reps. * Atlas problems that were thought to be due to ral castor were actually due to DDM because the DB machine had inadequate hardware. The machine has been changed and improved the number of parallel streams in FTS. This has improved a lot things during the weekend. From the Tier2 point of view oxford durham and ic-hep have been validated. Problem at nesc with the environment. End of the week hope to validate all the sites. Problems with QMW and UCL-Central due to old DPM version. QMW has a new test DPM and UCL-Central is going to upgrade. * Supernemo is waiting to be enabled at sites. Manchester will do it before the end of the year, Oxford is doing it other sites in london are doing it or going to do it. Glasgow has already enabled them and durham will do as well ecdf has problems with account allocation because of the centralized management of the grid. * Gridpp has a short name * vo.scotgrid.ac.uk has been enabled. So the scotgrid users will be moved from gridpp to the regional VO. Discussion about regional VOs and AUP and registration in EGEE. * John is still following up on the biomed issues. Urgent Tickets ============== Various tickets for london sites, possibly related to SAM tests failure. LFC not used should be removed but if you want to keep it, switch off the monitoring in the GOCDB. Not clear if it works. Or contact SAM team to ask how it works. Apparently there is not LFC registered in the GOCDB. The tickets were considered irrelevant and the COD was asked to close them. SAM DB has a latency respect to the BDII so things removed from the BDII might take longer to disappear from SAM. Ops Meeting =========== The ops meeting didn't go ahead. New release given to PPS. New production release is expected this week. Included job wrapper tests without rgma dependency. Reguest for sites to test the new version of AMGA (glite meta data server). SGE problems with apel in germany. Tickets references passed to london. Dave was fixing things in his repository because it was quicker but now the fixes are not in production CMS problems with WMS scalability. Catalin has installed a WMS instance at RAL. WMS tests at IC are ongoing on SL4. pheno complained about RB at RAL. Direct job submission is it a problem? No, it might become if the new CE doesn't support globus. Panda monitoring mioght have scalability problems. At least the monitoring server is having problems. Is it possible to run regional pandas? Yes it is but you loose the global scheduling. Site Reviews ============ RAL is failing Steve tests. RAL RB is having problems and all the sites are failing. Maybe Steve can try out also imperial for his tests. Monthly availability necessary to be green increases every month that's why it looks inconsistent. Sheffield has greatly improved, Cambridge is stable London sites are having problems. Birmingham discrepancy between accounting records due to multiple CEs. Sites have to take the latest apel rpms. Has anyone run yves script to check inconsitencies? Security issues =============== Comments on the incident response document. It is still the standard JSPG document. The security officer email is security-officer@gridpp.ac.uk. There are no comments. Availability workshop ===================== Experiments want to install their own monitoring services perhaps on the VO boxes. And perhaps other software because they are fed up with the middleware. Apparently it was a very successful workshop. Current Activities ================== Alessandra: is restructuring cfengine in Manchester to simplify it for the local sys admins. She has already restructured the Kiickstart server and now is rewirting the cfengine configuration to give it a sort of hirerachical structure. As it is the usefulness of cfengine is basically defeated. Mingchao: glexec? Pete: Bristol has lhcb problems, yves is having problems with a very buggy SL4 CE, JET site has upgraded to SL4. He wants to throw away babar farms but he can't. Oxford enabling new VOs. Duncan: dealing with tickets and problems. Helped Graeme to get Panda going. Brunal is adding new VOs and building a couple of new hosts. RHUL problems with the SRM and getting ready for the new cluster. Derek: move to castor, SRM 2.2, Catalin is working on glite WMS. Steve: trying to upgrade the tests to release 13. Actions review ============== Andrew: net mon boxes. Only IC is missing now. Kostas doesn't want it. Pete is asking if it can be installed at the JET site. AOB === What to do when the VOMS certificate expires? Update the gridpp WEB page and send a broadcast to sites. The update should be in January. Koala window ============ [11:07:25] Greig Cowan is someone taking minutes? [11:08:24] Jeremy Coles Alessandra [11:08:36] Greig Cowan great [11:29:36] Derek Ross I think its supposed to be about 3 days [11:39:39] Steve Lloyd joined [11:54:16] Graeme Stewart i have to go at 12. [11:59:14] Graeme Stewart ok, gotta go. [11:59:17] Graeme Stewart ttfn [11:59:21] Graeme Stewart left [11:59:54] Greig Cowan me too [11:59:57] Greig Cowan left [12:08:32] Andrew Elwell sorry - Didn't hear you say name [12:17:23] Frederic BROCHU left