=============================== GridPP Cloud meeting 2013-05-24 =============================== Present: David Colling, Adam Huffman, Simon Fayer, Andrew Lahiff, Peter Love, Andy McNab, Chris Walker, Wahid Bhimji, Robert Frank, Kashif Mohammad, Pete Gronbech CMS Status and Plans -------------------- Andrew - HLT: - networking has been upgraded - should be able to support 7000 running jobs - jobs running as of this morning (1,500) - network monitoring on its way - UK: - testing with Oxford cloud - CRAB jobs failed, manual job submission does work - David: we have reprocessing and analysis, no Monte Carlo production jobs yet ATLAS Status and Plans ---------------------- Peter - working towards stable production using IC cloud - problems with keys (at IC) and with cloud scheduler (at UVic) - ATLAS cloud meeting on 23rd - UVic has written a Squid discovery tool (Shoal), service for VMs so they can discover dynamically which Squid to use - at the moment using a hardwired proxy - Simon: doesn't CernVM already have this? - Andy asked about the image - is it pre-prepared? - Peter: yes, it has lots of hardcoded stuff - Kashif - is it site-specific? - if there is a way of discovering Squids, then the need for hardcoding will disappear - Next steps: scale things up - Jeremy: asks about relation to ATLAS in general, and the July pre-GDB meeting? - David: any plans for reprocessing or user analysis? - details not finalised yet LHCb Status and Plans --------------------- Andy McNab - central work at CERN by Mario et al - produced unified VM configuration that works for DIRAC, works on clouds, 'vacuum' at Manchester and BOINC - quite large amount of CPU available this way - Manchester testbed setup - running LHCb SAM tests on their testbed, using DIRAC - exactly same test used to validate tests for Monte Carlo - David asked if Andy had any performance measurements - Andy said no, the hardware is quite over-committed so performance probably not very good (using Xen, because CPUs don't have hardware virtualization) - he could do tests with more modern machines - writing documentation to encourage similar testing by other sites - Andy was at LHCb ad hoc computing workshop - lots of discussion about virtualization in general - people had used Ibex at CERN and Rackspace cloud - people had double performance at Rackspace, possibly the result of clever VM tuning? - maybe floating point? - 15% hit for I/O intensive jobs - aims to have documentation ready for HEP SYSMAN meeting, when he will ask for volunteers - can be run on a single machine, because they talk to each other but they're independent - best to run real jobs for performance testing - could be run at Imperial GridPP Cloud status at Imperial ------------------------------- - hardware reserved for benchmarking, not run yet - make sure monitoring covers any instances, not just CMS - make monitoring public? AOB --- - David: future plans? - quantifying performance - checkpointing/snapshotting - accounting - Kashif asked about scheduling and authentication - identity switching work not finished yet - Kashif mentioned EGI federated cloud in connection with monitoring - Kashif said they are already looking at it - David asked who are the APEL experts - Jeremy mentioned people at RAL e.g. Alison Packer - Jeremy will ask Alison - Andy McNab - where is the line about what can be accounted for as cloud work? - e.g. does BOINC work count as something for accounting? for whom? - if you have a special deal with Amazon, would that count? etc - can we extract a managerial statement about this? - Jeremy said there should be a discussion first - Peter said this shouldn't impede progress - Agreed it should happen in parallel - David asked Andy McNab to prepare an e-mail covering these points, for discussion at GridPP level and experiment level - Jeremy: security? John Green is leaving... - Simon at Imperial may look at this - Kashif mentioned the need to keep images up to date - David: Ian Bird talk at ACAT, he will distribute the link - David: we will increase the size of the cloud at IC as long as we're not penalised by GridPP - Peter essentially asking whether there will be more money? - no... - Pete Gronbech suggests the meeting should be regularly scheduled, even if David isn't there - Jeremy and Pete prepared to chair if needed Actions ------- 1. Jeremy to contact Alison Packer re. cloud accounting 2. Adam to make public Imperial cloud monitoring, where possible 3. David to distribute link to Ian Bird's ACAT talk 4. Andy to send e-mail re. cloud accounting