20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

Name: 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)
Start: 2013-10-14T09:00:00+02:00
End: 2013-10-18T13:00:00+02:00
Location: Amsterdam, Beurs van Berlage

14–18 Oct 2013

Amsterdam, Beurs van Berlage

Europe/Amsterdam timezone

CHEP2013 Logistics Management

info@chep2013.org

Running jobs in the Vacuum

17 Oct 2013, 11:22

22m

Graanbeurszaal (Amsterdam, Beurs van Berlage)

Graanbeurszaal

Amsterdam, Beurs van Berlage

Oral presentation to parallel session Distributed Processing and Data Handling A: Infrastructure, Sites, and Virtualization Distributed Processing and Data Handling A: Infrastructure, Sites, and Virtualization

Andrew McNab (University of Manchester (GB))

We present a model for the operation of computing nodes at a site using virtual machines, in which the virtual machines (VMs) are created and contextualised for virtual organisations (VOs) by the site itself. For the VO, these virtual machines appear to be produced spontaneously "in the vacuum" rather than in response to requests by the VO. This model takes advantage of the pilot job frameworks adopted by many VOs, in which pilot jobs submitted via the grid infrastructure in turn start job agents which fetch the real jobs from the VO's central task queue. In the vacuum model, the contextualisation process starts a job agent within the virtual machine and real jobs are fetched from the central task queue as normal. This is similar to ongoing cloud work where job agents are also run inside virtual machines, but where VMs are created by the virtual organisation itself using cloud APIs. An implementation of the vacuum scheme, vac, is presented in which a VM factory runs on each physical worker node to create and contextualise its set of virtual machines. With this system, each node's VM factory can decide which VO's virtual machines to run, based on site-wide target shares and on a peer-to-peer protocol in which the site's VM factories query each other to discover which virtual machine types they are running, and therefore identify which virtual organisations' virtual machines should be started as nodes become available again, and which virtual organisations' virtual machines should be signaled to shut down. A property of this system is that there is no gate keeper service, head node, or batch system accepting and then directing jobs to particular worker nodes, avoiding several central points of failure. Finally, we describe tests of the vac system using jobs from the central LHCb task queue, using the same contextualisation procedure for virtual machines developed by LHCb for clouds.

Andrew McNab (University of Manchester (GB))

Federico Stagni (CERN) Mario Ubeda Garcia (CERN)

Slides

mcnab-vac-17oct13.pdf

20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

CHEP2013 Logistics Management

Running jobs in the Vacuum

Graanbeurszaal

Amsterdam, Beurs van Berlage

Speaker

Description

Primary author

Co-authors

Presentation materials

Choose timezone

20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013)

CHEP2013 Logistics Management

Speaker

Description

Primary author

Co-authors

Presentation materials

Share this page

Direct link

Social networks

Calendaring