24–28 Oct 2011
Hosted by TRIUMF, SFU and the University of Victoria at the Harbour Center - Downtown Vancouver
Canada/Pacific timezone

ATLAS Great Lakes Tier-2 Site Report

24 Oct 2011, 11:50
15m
Hosted by TRIUMF, SFU and the University of Victoria at the Harbour Center - Downtown Vancouver

Hosted by TRIUMF, SFU and the University of Victoria at the Harbour Center - Downtown Vancouver

515 West Hastings Street Vancouver, BC Canada V6B 5K3
Site Reports Site Reports

Speaker

Dr Shawn McKee (University of Michigan ATLAS Group)

Description

We will report on the ATLAS Great Lakes Tier-2 (AGLT2), one of five US ATLAS Tier-2 sites, providing a brief overview of our experiences planning, deploying, testing and maintaining our infrastructure to support the ATLAS distributed computing model. AGLT2 is one of the larger WLCG Tier-2s worldwide with 2.2 PB of dCache storage and 4500 job-slots, so we face a number of challenges in monitoring, managing and maintaining our site. Many of those challenges are related to storage, data-management and I/O capabilities. As part of this report we will focus on our recent work in updating, configuring and monitoring our storage systems. In addition to describing new hardware like SSDs and multi-10GE storage nodes we will report on using such tools as pCache and LSM (Local Site Mover) and a new "site-aware" dCache configuration which have helped to remove some bottlenecks in our infrastructure. Because AGLT2 utilizes a central syslog host, we are able to track the behavior of all our worker nodes in staging files in and out via LSM logging. We have constructed a system based upon a custom-built MySQL database which tracks our local resources and merges in information from the central syslog host and the dCache billing DB to allow us to better understand and optimize our site's storage system behaviors. The last part of our report will show some results from using this new system.

Summary

A site report from AGLT2 summarizing the current status and focusing on recent storage related efforts to find and remove bottlenecks in the infrastructure.

Primary author

Dr Shawn McKee (University of Michigan ATLAS Group)

Co-authors

Mr Ben Meekhof (University of Michigan) Philippe Alain Luc Laurens (Michigan State University (US)) Raymond Brock (Michigan State University) Robert Ball (High Energy Physics) Tom Rockwell (Michigan State University)

Presentation materials