13–17 Feb 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

L-TEST: A FRAMEWORK FOR SIMPLIFIED TESTING OF DISTRIBUTED HIGH-PERFORMANCE COMPUTER SUB-SYSTEMS

13 Feb 2006, 11:00
7h 10m
Tata Institute of Fundamental Research

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
poster Grid middleware and e-Infrastructure operation Poster

Speaker

Mr Laurence Dawson (Vanderbilt University)

Description

Introducing changes to a working high-performance computing environment is typically both necessary and risky. Testing these changes can be highly manpower intensive. L-TEST supplies a framework that allows the testing of complex distributed systems with reduced configuration. It reduces setting up a test to implementing the specific tasks for that test. L-TEST handles three jobs that must be performed for any distributed test; task communication to move tasks to execution nodes, generation of reproducible stochastic distributions of tasks, and collection of test results. Tasks are communicated via a dynamic and configurable set of storage systems, these storage systems can be reused for result collection, or a parallel set of systems may be set up for this results. The task generation framework supplies a basic set of stochastic generators along with framework code for calling these generators. The full workload of tasks is generated by aggregating multiple generator instances, in order to allow complex configuration of tasks. Although L-TEST does not restrict the tester to the following cases, this paper identifies several use cases that are of particular interest. The development of the L-STORE distributed file-system required testing for both correctness and performance. This paper describes how L-TEST was used to test both. Reads and write performance data, and integrity data were reported to separate communicators and analyzed separately. The performance configuration of L-TEST was also utilized, almost unchanged, to test a parallel file-system introduced to the ACCRE parallel cluster. In addition to testing the performance and integrity of file-systems, we describe how L-TEST can test the effect of planned changes on several characteristics of a cluster supercomputer; these include network bandwidth and latency and the task scheduling system for submission of jobs to the cluster.

Primary author

Mr Laurence Dawson (Vanderbilt University)

Co-authors

Prof. Alan Tackett (Vanderbilt University) Prof. Paul Sheldon (Vanderbilt University)

Presentation materials

There are no materials yet.