GridPP Cloud project meeting 1st March 2013

Contents

Present

David Colling, Pete Gronbech, Andrew McNab, Kashif Mohammad, Peter (?), Wahid Bhimji, Matt Doidge, Robert Frank, Chris Walker, Andrew Washbrook, John Green, Linda Cornwall, Robin Long, Andrew Lahiff

Actions from the previous meeting

  1. David to report on EGI conference call
    1. Agreed to join
    2. Not started yet
    3. Appears to be very much "grid on cloud" e.g. uses GLUE2
    4. We will be involved in their various demos
    5. David asked for Kashif's experiences:
    6. Kashif said he had some involvement but not much
    7. A cluster was installed at OeRC, though it wasn't part of the federated cloud
  2. Adam to create a store area for Andrew - DONE
  3. Andrew to send link for dashboard - DONE
  4. Documentation of the Imperial-based pilot on the wiki - STARTED, ongoing
  5. Wahid to write a plan for storage testing - DONE
  6. Cloud community - ongoing
  7. David to distribute a link to GDB discussions - DONE

Experiments

LHCb - Andy McNab

  • LHCb DIRAC has agreed to use VMDIRAC, when it's more mature
  • Mario has access now to the GridPP cloud at Imperial
  • Experiments in Manchester with virtualization and CernVM image
    • E.g. reducing the number of cores and RAM in the VM, while it's running
    • Specify a shutdown time using HEPiX protocol e.g. if rack maintenance is to occur in a week's time
    • As jobs expire and the VM uses less CPU time, reduce the number of cores it's using
  • It is possible to have a shared filesystem between the hypervisor host and VM e.g. for contextualization
  • Want to ensure APIs are identical between standalone and cloud images
  • Using CernVM because HEPiX tools are installed
  • Some discussion of Boxgrinder - MAJH to put documentation on wiki
  • CernVM to move away from Conary in version 3

ATLAS- Peter

  • BNL and CERN have working cloud setups
  • Testing on Imperial setup
  • Also looking at RAL and ECDF
  • Want to spin up lots of instances
  • HTCondor is the core component on which they're relying
  • Need to understand the tools better e.g. Boxgrinder, cloud-init etc.
  • ACTION: Adam to put notes about Boxgrinder etc. on the wiki
  • David commented that there is a general push for common submission tools across the experiments

CMS - Andrew Lahiff

  • Local storage access fixed the CVMFS problems
  • Jobs are running well now
  • A CVMFS race condition has been fixed
  • VMs are still 'hanging around' in OpenStack i.e. taking up resources in 'shutoff' state
  • Andrew has provided log files to developers - this is the main issue to be resolved at the moment
  • Andrew suggests more stress testing as the next task - More than 100 jobs, for example
  • Similar issues on the HLT - Need to convince HLT people to improve the networking

Imperial GridPP pilot

Other UK cloud sites

Connections to other projects

Security - John Green

AOB

Actions

1. Adam to provide a temporary gateway node for external access to instances

2. Adam to put notes about image creation and other relevant tools on the wiki