This talk will discuss how we worked with Dr. Amy Apon, Brandon Posey, AWS and the Clemson DICE lab team dynamically provisioned a large scale computational cluster of more than one million cores utilizing Amazon Web Services (AWS). We discuss the trade-offs, challenges, and solutions associated with creating such a large scale cluster with commercial cloud resources. We utilize our large...
I present the recent developments for our cloudschdeduler, which we use to run HEP workloads on various clouds in North America and Europe. We are working on a complete re-write utilizing modern software technologies and practices.
GlideinWMS is a workload management and provisioning system that lets
you share computing resources distributed over independent sites. A
dynamically sized pool of resources is created by GlideinWMS pilot
Factories, based on the requests made by GlideinWMS Frontends. More
than 400 computing elements are currently serving more than 10
virtual organizations through glideinWMS. This contribution...
The KEK Central Computer System (KEKCC) is a service, which provides large-scale computer resources, Grid and Cloud computing, as well as common IT services. The KEKCC is entirely replaced every four or five years according to Japanese government procurement policy for the computer system. Current KEKCC has been in operation since September 2016 and decommissioning will start in early...
The Pacific Research Platform (PRP) is operating a Kubernetes cluster that manages over 2.5k CPU cores and 250 GPUs. Most of the resources are being used by local users interactively starting directly Kubernetes Pods.
To fully utilize the available resources, we have deployed an opportunistic HTCondor pool as a Kubernetes deployment, with worker nodes environment being fully OSG compliant....
The vast breadth and configuration possibilities of the public cloud offer intriguing opportunities for loosely coupled computing tasks. One such class of tasks is simply statistical in nature requiring many independent trials over the targeted phase space in order to converge on robust, fault tolerant and optimized designs. Our single threaded target application (50-200 MB) solves a...
CERN, the European Laboratory for Particle Physics, is running OpenStack for its private Could Infrastructure among other leading open source tools that helps thousands of scientists around the world to uncover the mysteries of the Universe.
In 2012, CERN started the deployment of its private Cloud Infrastructure using OpenStack. Since then we moved from few hundred cores to a multi-cell...
Modern software development workflow patterns often involve the use of a developer’s local machine as the first platform for testing code. SLATE mimics this paradigm with an implementation of a light-weight version, called MiniSLATE, that runs completely contained on the developer local machine (laptop, virtual machine, or another physical server). MiniSLATE resolves many development...
In the spring of 2018, central operations services were migrated out of the Grid Operations Center of Indiana into other participating Open Science Grid institutions. This talk summarizes how the migration has affected the services provided by the OSG, and gives a summary of how central OSG services interface with US WLCG sites.