25–29 Mar 2019
SDSC Auditorium
America/Los_Angeles timezone

Addressing the Challenges of Executing Massive Computational Clusters in the Cloud

26 Mar 2019, 11:45
25m
E-B 212 (SDSC Auditorium)

E-B 212

SDSC Auditorium

10100 Hopkins Drive La Jolla, CA 92093-0505
Grid, Cloud & Virtualisation Grid, Cloud and Virtualization

Speaker

Boyd Wilson (Omnibond)

Description

This talk will discuss how we worked with Dr. Amy Apon, Brandon Posey, AWS and the Clemson DICE lab team dynamically provisioned a large scale computational cluster of more than one million cores utilizing Amazon Web Services (AWS). We discuss the trade-offs, challenges, and solutions associated with creating such a large scale cluster with commercial cloud resources. We utilize our large scale cluster to study a parameter sweep workflow composed of message-passing parallel topic modeling jobs on multiple datasets.

At peak, we achieve a simultaneous core count of 1,119,196 vCPUs across nearly 50,000 instances, and are able to execute almost half a million jobs within two hours utilizing AWS Spot Instances in a single AWS region.

Additionally we will discuss a follow on project that the DICE Lab is currently working on in the Google Cloud Platform (GCP) that will enable a Computer Vision analytics system to concurrently processes hundreds of thousands of hours of highway traffic video providing statistics on congestions, vehicle trajectories and neural net pre-annotation. We will discuss how this project will differ from the previous one and how additional boundaries are being pushed.

Relevant Papers:
https://ieeexplore.ieee.org/abstract/document/8411029

https://tigerprints.clemson.edu/computing_pubs/38/

Primary authors

Dr Alexander Herzog (Clemson University) Dr Amy Apon (Clemson University) Boyd Wilson (Omnibond) Dr Brandon Posey (BMW) Christopher Gropp (Clemson University)

Presentation materials