14–18 Oct 2013
Amsterdam, Beurs van Berlage
Europe/Amsterdam timezone

AutoPyFactory and the Cloud: Flexible, scalable, and automatic management of virtual resources for ATLAS

14 Oct 2013, 15:00
45m
Grote zaal (Amsterdam, Beurs van Berlage)

Grote zaal

Amsterdam, Beurs van Berlage

Poster presentation Distributed Processing and Data Handling A: Infrastructure, Sites, and Virtualization Poster presentations

Speaker

John Hover (Brookhaven National Laboratory (BNL)-Unknown-Unknown)

Description

AutoPyFactory (APF) is a next-generation pilot submission framework that has been used as part of the ATLAS workload management system (PanDA) for two years. APF is reliable, scalable, and offers easy and flexible configuration. Using a plugin-based architecture, APF polls for information from configured information and batch systems (including grid sites), decides how many additional pilot jobs are needed, and submits them. With the advent of cloud computing, providing resources goes beyond submitting pilots to grid sites. Now, the resources on which the pilot will run also need to be managed. Handling both pilot submission and controlling the virtual machine life cycle (creation, retirement, and termination) from the same framework allows robust and efficient management of the process. In this paper we describe the design and implementation of these virtual machine management capabilities of APF. Expanding on our plugin-based approach, we allow cascades of virtual resources associated with a job queue. A single workflow can be directed first to a private, facility-based cloud, then a free academic cloud, then spot-priced EC2 resources, and finally on-demand commercial clouds. Limits, weighting, and priorities are supported, allowing free or less expensive resources to be used first, with costly resources only used when necessary. As demand drops, resources are drained and terminated in reverse order. Performance plots and time series will be included, showing how the implementation handles ramp-ups, ramp-downs, and spot terminations.

Primary author

John Hover (Brookhaven National Laboratory (BNL)-Unknown-Unknown)

Co-authors

Dr Jose Caballero Bejar (Brookhaven National Laboratory (US)) Peter Love (LANCASTER UNIVERSITY)

Presentation materials