The Open Science Grid (OSG) relies upon the network as a critical part of the distributed infrastructures it enables. In 2012 OSG added a new focus area in networking with a goal of becoming the primary source of network information for its members and collaborators. This includes gathering, organizing and providing network metrics to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing.
In September of 2015 this service was deployed into the OSG production environment. We will report on the creation, implementation, testing and deployment of the OSG Networking Service. Starting from organizing the deployment of perfSONAR toolkits within OSG and its partners, to the challenges of orchestrating regular testing between sites, to reliably gathering the resulting network metrics and making them available for users, virtual organizations and higher level services all aspects of implementation will be reviewed. In particular, several higher level services were developed to bring the OSG network service to its full potential. These include a web-based mesh configuration system, which allows central scheduling and management all the network tests performed by the instances, a set of probes to continually gather metrics from the remote instances and publish it to different sources, a central network datastore (Esmond), which provides interfaces to access the network monitoring information in close to real time and historically (up to a year) giving the state of the tests and the perfSONAR infrastructure monitoring, ensuring the current perfSONAR instances are correctly configured and operating as intended.
We will also describe the challenges we encountered in ongoing operations for the network service and how we have evolved our procedures to address those challenges. Finally we will describe our plans for future extensions and improvements to the service.
|Secondary Keyword (Optional)||Network systems and solutions|
|Tertiary Keyword (Optional)||Monitoring|
|Primary Keyword (Mandatory)||Collaborative tools|