CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

Virtual Machine Provisioning, Code Management and Data Movement Design for the Fermilab HEPCloud Facility

13 Oct 2016, 15:30

1h 15m

San Francisco Marriott Marquis

Poster Track 3: Distributed Computing Posters B / Break

Burt Holzman (Fermi National Accelerator Lab. (US)) Gabriele Garzoglio Steven Timm (Fermilab) Stuart Fuess (Fermilab)

The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOvA, at the scale of 58000 simultaneous cores. This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontier-squid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.

Primary Keyword (Mandatory)	Cloud technologies
Secondary Keyword (Optional)	Computing facilities
Tertiary Keyword (Optional)	Distributed data handling

Steven Timm (Fermilab)

Anthony Tiradani (Fermilab) Burt Holzman (Fermi National Accelerator Lab. (US)) Mr Davide Grassano (Fermilab) Gabriele Garzoglio Mr Hao Wu (Illinois Institute of Technology) Dr Hyun Woo Kim (Fermilab) Mr R. Glenn Cooper (Fermilab) Mr Rahul Krishnamurthy (Illinois Institute of Technology) Robert Kennedy (Fermilab) Prof. Shangping Ren (Illinois Institute of Technology) Mr Shivakumar Vinayagam (Illinois Institute of Technology) Stuart Fuess (Fermilab)

Highlights-545.pdf

Poster-545.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Virtual Machine Provisioning, Code Management and Data Movement Design for the Fermilab HEPCloud Facility

San Francisco Marriott Marquis

Speakers

Description

Author

Co-authors

Presentation materials

Choose timezone

CHEP 2016 Conference, San Francisco, October 8-14, 2016

Speakers

Description

Author

Co-authors

Presentation materials