Speaker
Mr
Rohitashva Sharma
(BARC)
Description
It is important to know the Quality of Service offered by nodes in a cluster both for
users and load balancing programs like LSF, PBS and CONDOR for submitting a job on to
a given node. This will help in achieving optimal utilization of nodes in a cluster.
Simple metrics like load average, memory utilization etc do not adequately describe
load on the nodes or Quality of Service (QoS) experienced by user jobs on the
nodes.We had undertaken a project to predict Quality of Service seen by user job on a
cluster node by correlating simple metrics like Load Average, Memory Utilization and
IO on the node. This paper presents our efforts and methodology we have followed for
predicting QoS of nodes in a cluster.Brief description of approach followed – User
jobs are divided mainly as CPU intensive, Memory intensive and I/O intensive. We
created probe programs to represent each type of job. We have also created load
programs to generate different types of loads in the system. We used
EDG-Fabric-Monitoring System to monitor system metrics on nodes in cluster. Execution
time of sample probe programs and system metrics values were measured under different
load conditions. We tried to correlate execution time of probe programs with values
of system metrics. This correlation metric gives better measure of Quality of Service
(QoS) experienced by user programs (probes) in the system. Based upon our experiences
we added a metric called ‘VmstatR’ in monitoring system.We have derived QoS metric in
3 different ways. I) by using Unix Load average metric, II) by using VmstatR metric
III) By using CPU utilization and load on the node. We will discuss variations
between measured execution time for sample probe programs and execution time
predicted by QoS metric derived in above-mentioned manner.We have also studied
behaviour of CMSIM (simulation) and ORCA (reconstruction) programs under various load
conditions and tried to find correlation metric to predict QoS for these jobs.
Finally, we will present difficulties experienced in predicting Quality of Service on
nodes in a cluster.
Primary author
Mr
Rohitashva Sharma
(BARC)
Co-authors
Mr
Helge Meinhard
(CERN)
Mr
Olof Barring
(CERN)
Mr
P S Dhekne
(BARC)
Mr
R S Mundada
(BARC)
Mrs
Sonika Sachdeva
(BARC)
Mr
Tony Cass
(CERN)