21–25 May 2012
New York City, NY, USA
US/Eastern timezone

Monitoring the US ATLAS Network Infrastructure with perfSONAR-PS

21 May 2012, 16:35
25m
Room 914 (Kimmel Center)

Room 914

Kimmel Center

Parallel Computer Facilities, Production Grids and Networking (track 4) Computer Facilities, Production Grids and Networking

Speaker

Shawn Mc Kee (University of Michigan (US))

Description

Global scientific collaborations, such as ATLAS, continue to push the network requirements envelope. Data movement in this collaboration is projected to include the regular exchange of petabytes of datasets between the collection and analysis facilities in the coming years. These requirements place a high emphasis on networks functioning at peak efficiency and availability; the lack thereof could mean critical delays in the overall scientific progress of distributed data-intensive experiments like ATLAS. Network operations staff routinely must deal with problems deep in the infrastructure; this may be as benign as replacing a failing piece of equipment, or as complex as dealing with a multi-domain path that is experiencing data loss. In either case, it is crucial that effective monitoring and performance analysis tools are available to ease the burden of management. We will report on our experiences deploying and using the perfSONAR-PS Performance Toolkit at ATLAS sites in the United States. This software creates a dedicated monitoring server, capable of collecting and performing a wide range of passive and active network measurements. Each independent instance is managed locally, but able to federate on a global scale; enabling a full view of the network infrastructure that spans domain boundaries. This information, available through web service interfaces, can easily be retrieved to create customized applications. USATLAS has developed a centralized "dashboard" offering network administrators, users, and decisions makers the ability to see the performance of the network at a glance. The dashboard framework includes the ability to notify users (alarm) when problems are found, thus allowing rapid response to potential problems and making perfSONAR-PS crucial to the operation of our distributed computing infrastructure.

Primary author

Shawn Mc Kee (University of Michigan (US))

Co-authors

Andrew Lake (ESnet) Collaboration Atlas (Atlas) Dr Horst Severini (University of Oklahoma (US)) Jason Zurawski (Internet2) Philippe Laurens (Michigan State University) Stephen Wolff (Internet2) Dr Tomasz Wlodek (Brookhaven National Laboratory)

Presentation materials