Feb 13 – 17, 2006
Tata Institute of Fundamental Research
Europe/Zurich timezone

Measuring Quality of Service on Nodes in a Cluster

Feb 15, 2006, 2:00 PM
20m
D405 (Tata Institute of Fundamental Research)

D405

Tata Institute of Fundamental Research

Homi Bhabha Road Mumbai 400005 India
oral presentation Computing Facilities and Networking Computing Facilities and Networking

Speaker

Mr Rohitashva Sharma (BARC)

Description

It is important to know the Quality of Service offered by nodes in a cluster both for users and load balancing programs like LSF, PBS and CONDOR for submitting a job on to a given node. This will help in achieving optimal utilization of nodes in a cluster. Simple metrics like load average, memory utilization etc do not adequately describe load on the nodes or Quality of Service (QoS) experienced by user jobs on the nodes.We had undertaken a project to predict Quality of Service seen by user job on a cluster node by correlating simple metrics like Load Average, Memory Utilization and IO on the node. This paper presents our efforts and methodology we have followed for predicting QoS of nodes in a cluster.Brief description of approach followed – User jobs are divided mainly as CPU intensive, Memory intensive and I/O intensive. We created probe programs to represent each type of job. We have also created load programs to generate different types of loads in the system. We used EDG-Fabric-Monitoring System to monitor system metrics on nodes in cluster. Execution time of sample probe programs and system metrics values were measured under different load conditions. We tried to correlate execution time of probe programs with values of system metrics. This correlation metric gives better measure of Quality of Service (QoS) experienced by user programs (probes) in the system. Based upon our experiences we added a metric called ‘VmstatR’ in monitoring system.We have derived QoS metric in 3 different ways. I) by using Unix Load average metric, II) by using VmstatR metric III) By using CPU utilization and load on the node. We will discuss variations between measured execution time for sample probe programs and execution time predicted by QoS metric derived in above-mentioned manner.We have also studied behaviour of CMSIM (simulation) and ORCA (reconstruction) programs under various load conditions and tried to find correlation metric to predict QoS for these jobs. Finally, we will present difficulties experienced in predicting Quality of Service on nodes in a cluster.

Primary author

Co-authors

Mr Helge Meinhard (CERN) Mr Olof Barring (CERN) Mr P S Dhekne (BARC) Mr R S Mundada (BARC) Mrs Sonika Sachdeva (BARC) Mr Tony Cass (CERN)

Presentation materials