Oct 10 – 14, 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

The HEP Cloud Facility: elastic computing for High Energy Physics

Oct 11, 2016, 11:30 AM
GG C2 (San Francisco Mariott Marquis)


San Francisco Mariott Marquis

Oral Track 3: Distributed Computing Track 3: Distributed Computing


Gabriele Garzoglio


The need for computing in the HEP community follows cycles of peaks and valleys mainly driven by conference dates, accelerator shutdown, holiday schedules, and other factors. Because of this, the classical method of provisioning these resources at providing facilities has drawbacks such as potential overprovisioning. As the appetite for computing increases, however, so does the need to maximize cost efficiency by developing a model for dynamically provisioning resources only when needed.

To address this issue, the HEP Cloud project was launched by the Fermilab Scientific Computing Division in June 2015. Its goal is to develop a facility that provides a common interface to a variety of resources, including local clusters, grids, high performance computers, and community and commercial Clouds. Initially targeted experiments include CMS and NOvA, as well as other Fermilab stakeholders.

In its first phase, the project has demonstrated the use of the “elastic” provisioning model offered by commercial clouds, such as Amazon Web Services. In this model, resources are rented and provisioned automatically over the Internet upon request. In January 2016, the project demonstrated the ability to increase the total amount of global CMS resources by 58,000 cores from 150,000 cores - a 25 percent increase - in preparation for the Recontres de Moriond. In March 2016, the NOvA experiment has also demonstrated resource burst capabilities with an additional 7,300 cores, achieving a scale almost four times as large as the local allocated resources and utilizing the local AWS s3 storage to optimize data handling operations and costs. NOvA was using the same familiar services used for local computations, such as data handling and job submission, in preparation for the Neutrino 2016 conference. In both cases, the cost was contained by the use of the Amazon Spot Instance Market and the Decision Engine, a HEP Cloud component that aims at minimizing cost and job interruption.

This paper describes the Fermilab HEP Cloud Facility and the challenges overcome for the CMS and NOvA communities.

Primary Keyword (Mandatory) Computing facilities
Secondary Keyword (Optional) Cloud technologies

Primary authors

Anthony Tiradani (Fermilab) Burt Holzman (Fermi National Accelerator Lab. (US)) Gabriele Garzoglio Robert Kennedy (Fermilab) Steven Timm (Fermilab) Stuart Fuess (Fermilab)

Presentation materials