Cloud pre-GDB meeting, March 12, 2013 (KIT)

Agenda

https://indico.cern.ch/conferenceDisplay.py?confId=223689

Attendees

Michel Jouvin, Tony Cass, Ulrich Schwickerath, Helge Meinhard, Stefan Roiser, Hélène Cordier, Andreas Heiss, Manfred Alef,, Dorin Lobuntu, Dimitri, Nilsen, Andreas Petzold, Ian Colier, Dave Colling, Tim Bell, Claudio Grande, Andrew McNab, Adam Huffman, Gavin McCance

Introduction - M. Jouvin

See slides.

Security

  • Michel wonders if sites are still as fearful as a few years ago about users having root access, commenting that CERN, for example, is more relaxed now than in the past.
  • Tony C notes that sites are likely to disagree on this issue and it would be better to have a common position defined by the security team that should be accepted as the standard. For the moment, the policies still require sites to maintain traceability logs and this ability might be compromised if users have root access. Maarten Litmaath has noted that site managers are presumed to be responsible for private clouds, not the end user (credit card holder) as for public clouds. In the CERN case, the relaxed view is for CERN users submitting directly, not for grid submitted jobs?
    • Linda Cornwall says that these issues are being addressed but this will take time.
  • Michel J: Are grid security issues relevant if the submission is purely cloud-like, not using grid tools?
    • Tony C: Even if we are not using grid interfaces, the trust between WLCG sites depends in part on the WLCG security policies so these must be respected
  • Tim B: is traceability enough?
    • Tony C: Maybe, but can you trust the traceability if someone has root access?
    • Does the site need to have all the information on tracability inside the image (eg. rsyslog)
  • Claudio Grandi suggests that if root access is provided it might be to a portioned off part of the network with restricted access to the storage system which would not be acceptable.
  • Michel asks if experiments can comment on the requirements, noting that end-users are likely to be working through an experiment framework and unlikely to have a requirement for root access. Is the problem restricted to a need for root access by authorised individuals, not end users.
    • Claudio G: agrees with view that end users don't need the root access.
    • Ulrich: only the person who launches the VM can pass his ssh key to gain root access. This does not imply that later users of the VM will have root access as well.
  • Helge: what are the technically compelling reasons for needing root access? This is not available on current worker nodes so why is it needed on clouds?
    • Tim B: how is contextualisation done without root access?
    • Ulrich S: as addressed by the HEPiX vwg by providing the necessary hooks to be used at instantiation.
    • Michel, but can everything be done in this way, for example joining the experiment specific job framework.
    • Ulrich S: in principle yes, as the image has been prepared by the experiments.
  • Tim B: we need the users to express the requirements for how they want to do contextualisation.
    • Tony C: possibly, but we need to be able to bridge the differing requirements of sites and users.

Conclusion: there is no requirement for individual end-users to have root access. There might be a requirement for the experiment framework to have root access but this is something that we ought to be able to support as a community given that traceability already requires collaboration between sites and Vos (c.f. glexec).

  • Dave C: good that we agree, but important that this is validated by the relevant security bodies.
  • Ulrich S: do we need some mechanism like glexec in the virtual machines to trace user identity changes--and linked to Argus so that sites can easily ban a user without shutting down all VMs from a VO
    • Michel: possibly, to be discussed with security/glexec experts and to be considered as part of the overall security policy for use of images.

Contextualisation

  • Introduction from Michel: contextualisation is required as images can't run everywhere without this. Similarly, need some common policy so images can run everywhere. Contextualisation includes user requirements (credential passing) and site requirements (e.g. to establish logging).
  • Seem to be two competing options
    • Amiconfig based approach developed by HEPiX vwg and well integrated into CERNVM (which many experiments have said will be the base of their cloud images), and
    • CloudInit which appears to be the growing external choice.
  • Dave C: why do we need to make a choice? Can't VOs and cloud providers agree?
    • Michel: yes, but we should minimise the effort for experiments to use the resources we make available through WLCG.
    • Ulrich: user needs to know what the image support and needs to structure his user data accordingly.
  • Ulrich S points out the link with the previous topic in that amiconfig contextualisation works without root access, but this is needed for Cloudinit.
  • Tim B notes that an advantage of a single contextualisation is that sites can provide more help if they know what is being used. CloudInit is preferred for the CERN cloud as it is more supported by companies and has developed over the past couple of years unlike amiconfig which is rather static.
  • Stefan R: LHCb is using CERNVM and finds that amiconfig is sufficient to support all their needs.
  • Michel J says that unfortunately Predrag Buncic was not able to join the meeting; we need his input to know how difficult it would be for CERNVM to support CloudInit. Someone suggests that this is in plan and is likely to be delivered with the SL6 version of CERNVM.

Summary: Agreement that we should standardise on a single contextualisation mechanism. Even if there are no strong requirements, CloudInit looks to be the more forward-looking choice.

  • Ulrich notes that care has to be taken with passing credentials to image via contextualisation as some part of the exchange is in the clear. If root access is available then the exchange can all be encrypted.
    • As an example, the contextualization via magic ip in OpenStack is by default done by a download using plain http
    • Claudio G notes that this doesn't have to be as root, it could be as any user
    • Michel J points out that you can probably only guarantee the existence of the root account before contextualisation.

VM instantiation

  • Is instantiation interface a critical issue? Experiments seem able to have backends that can interface to different site requirements already, even CMS who are most in favour of using EC2. No obvious external standards.
    • In fact most experiments (not clear for ATLAS who was not present because of their conflicting SW week) use abstract APIs providing by libraries like libCloud (DIRAC), CERNVM Cloud (ALICE). CMS may decide to use deltaCloud supported by Condor and thus available for free.
    • When using these libraries (providing plugins for all the main cloud MW), no need for an agreed standard among experiments: which library to use is an experiment decision (no impact on site and other experiments).
  • Helge: sites shouldn't have to support lots of different things, and why is there any need to support anything other than EC2?
    • Michel: difficulty with EC2 is the differences in terms of authentication. It might be fine for sites but experiments may have to work with many EC2 flavours.
  • Michel J suggests that OCCI should not be used, although it was noted that this had been adopted by the EGI TF as it is a more formal standard than EC2. Sites participating in the EGI Federated Cloud are required to support this. OCCI implementations tend to exist for the main cloud MW but they are not mainstream in any of them.

VM Duration

  • Michel J notes from the GDB discussion that experiments expect virtual machine durations to be measured in weeks rather than days. Two corollaries:
    • need for VOs to be able to shut down a virtual machine they no longer need, and
    • need for sites to be able to terminate gracefully if they need to reclaim resources if this is needed.
  • Michel notes that there is not universal agreement that this is needed but he personally thinks that resources won't be allocated above pledge values if they cannot be reclaimed and that experiments won't appreciate abrupt VM termination.
  • A proposal was made in previous discussions to guarantee for each VM instantiated, based on SLAs, a minimum time to live and a minim advance notice before termination
    • It's only covering normal operations, not operational incidents at sites leading to VM to be kiiled immediately or to a site having to shutdown/power off many resources.
  • Both of these lead to the question of how sites can communicate with the virtual machine. The HEPiX vwg made a proposal but is this acceptable?
  • Stefan R: If sites are going to choose which VM to terminate then LHCb wants at least 48 hours notice. Is this acceptable? They would be willing to terminate a VM of their choice with much less notice.
    • Andrew McNab: 48 hours seems reasonable.
  • Someone questions, though, if other experiments might be willing to wait 48 hours to get resources back.
    • Gavin McC suggests that there could be different SLAs for resources provided with the pledge (guaranteed 48 hours notice) or those above (shorter notice possible).
    • Agreement that the advance notice period should be fixed when the VM is started and not changed afterwards
  • Someone (Andrew McN) says that jobs should be given the absolute time at which they should end so they can decide whether to end early or to run shorter payloads up to that point in time.
  • Tony C says that this is exactly what was proposed by the HEPiX vwg. In response to the comments on Michel's slides that the update mechanism was not defined TC says this was deliberate, sites can choose whatever mechanism is most appropriate to them---but also share experiences.
    • Script implementing the update mechanism would be deployed as part of the site contextualization
  • Tim B expresses some worry about having hypervisors intervene to update virtual machines they control. TC says this does not have to be the case, the file could be in a filesystem mounted at contextualisation or updated by a process launched at contextualisation.
    • Gavin McC suggests the HEPiX proposal could be modified to say this is a command that can be executed to return the machine end time.
    • There was more discussion on this later. Michel J points out, though, that the merit of the "file containing the end timestamp" is simplicity from the point of view of the user.
  • There was some discussion of Stefan R's idea that sites could ask experiments to return resources of their choice. This was felt to be interesting but would require communication between site infrastructure and pilot job frameworks which risks adding complexity.
    • Gavin McC again says that resources above pledge/quota could be with a different SLA and recovered more quickly.
    • Tony C suggests that we should wait and see what happens in practice as it may be that VMs terminate often enough that we don't need a complex mechanism---and that if we do, what we design will be addressing actual experiences. This suggestion was agreed.
  • In response to a comment from Tim B, Tony C says there is no requirement on sites to update the original defined end-date of a virtual machine and that he does not consider this is a commitment that a virtual machine will be maintained until that time (e.g. via live migration); hardware failures will terminate virtual machines just as they terminate real ones today.
  • No objections to Michel's proposal that we go ahead with long-lived lifetimes for VMs and the HEPiX vwg proposal for signalling end-of-life and how-to-end-prematurely?

VM fairshare scheduling

  • Michel notes that mention of fairshare is disliked as it is interpreted as importing obsolete ideas into the cloud world. He understands it to be the approach, though, that there is no static partitioning of resources, but rather that resources are shared to maximise utilisation but there is still control to ensure that resource pledges are fulfilled over an extended period. Are there objections to this view of the meaning of fairshares in a cloud environment? MJ noted that this approach requires the ability to end machines early but this was just agreed.
  • There were no objections. Ulrich notes, however, that unlike batch systems, clouds do not have a queue from which sites can choose for which VO to execute work. This leads to a difficulty for the site to get noticed of a request by a new VO (one currently not "favoured").
    • Fairshare requires knowlegde of unsatisfied/pending requests
    • It is far from clear we want to reintroduce queuing into cloud
    • Claudio: if we need to reinvent/reimplement the batch scheduler features in the cloud world, we are better to use a real one! Clouds cannot be considered as an alternative to the current grid CE technologoy f they don't provide the required features.
  • Ulrich suggests we should explore if another model is possible, reusing the concepts found in public clouds, based on the resource prices and the credit card. Something based on an economic model with credit granted and consumed as resources are used.
    • Don't shutdown VMs but make them more and more expensive while giving a fixed amount of credits to the VOs.
    • The VO with the most credit would be most likely to have a resource request satisfied---or sites could extend the lifetime of VMs for that VO.

Conclusion: more discussion needed on how to notify requests to sites or on the other possible models.

Accounting

Michel notes that

  • there is general agreement to account on wall-clock time used
    • but notes Andrew McNab's point that resource agencies won't like resources being fully accounted on a wall clock basis if the CPUs are idle; experiments must make sure idle VMs are teminated,
  • APEL has shown that it can report use for private clouds
    • we are not responsible for reporting usage of public clouds.
    • This is nuanced by Helge and Tony: sites are not responsible for reporting this if it is a private experiment decision to use them (as opposed to a site choosing to fulfill pledges through use of a public cloud). However, experiments should track this usage as funding agencies will very likely want to have a proper overview of the money going on computing.

There were no objections to these points.

  • Michel raises the problem of accounting and benchmarking: how can you know the HS06 capacity of a virtual machine, especially as the number of VMs running on a hypervisor varies?
    • Ulrich suggests that it could be possible to arrange on the hypervisor that a VMs are guaranteed a fixed HS06 performance
      • Tony wonders if this would artificially increase latency for users if HS06 is artificially limited to something lower than the real machine
      • Ulrich says not: there would simply be a minimum that would be guaranteed and jobs would achieve this or not be allocated wall clock time. If fewer VMs than cores were running on a machine then the actual performance would be higher than the guarantee.
    • Michel asks if this problem is really specific to cloud? His feeling is that this is a problem we had in the grid world but that we were unable to solve. May be an area where we are trying to do better but should keep in mind that the grid solution has been accepted so far by everybody.
    • Michel asks Ulrich to follow up on this suggestion.

Wrap-up

Good progress on identifying the consensus and the real issues

  • Clear from discussions today that the main issue to work on is the graceful termination of VM
    • Probably not much more discussion needed but some real testing of the ideas discussed
  • Contextualization: check with CERNVM plans for CloudInit support
    • But not a showstopper to work in the short term
  • Continue discussions on the other topics in the egroup, in particular VM scheduling ideas and accouting

Probably another similar meeting like this one would be good in the future

  • 2-3 months from now
  • Need new input from concrete work and real testing to avoid restarting the same discussions

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2013-03-14 - StefanRoiser
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback