Speaker
Description
Keywords
Scheduling, Reinforcement Learning
Impact
We developed a simulation framework for experimentation and validation. The most important setting to consider is the non-linear continuous approximation of the value function. We have explored various neural networks architectures for this approximation. Sparse neural networks allow representing a modal variable describing user identity; introducing this variable has a significant performance impact.
The experimental validation exploits various segments (typically one week) of a trace that at the LAL site, equipped with a MAUI/PBS scheduler, and with Virtual Reservations enabled. Various performance metrics are examined, namely the distributions of the original utility function, the relative overhead (ratio of waiting time to execution time), the absolute waiting time, and the distance to the optimal fair-share. The RL scheduler consistently outperforms the native scheduler, w.r.t. QoS, but requires a significant training period, while exhibiting similar faire-share performance.
URL for further information
http://www.grid-observatory.org
Detailed analysis
Our application of RL to site scheduling discovers online a policy that maps the site’s states to the decisions the scheduler ought to take in those states so as to maximize long-term cumulative rewards. Compact descriptions allow steering the scheduling process through high-level objectives. The requirement for differentiated QoS is expressed through parameterized utility functions associated to responsive and batch jobs; as they describe how “satisfied” the user will be if his/her job finishes after a certain time delay, the parameters have intuitive interpretation. The fair-share reward controls the compliance of the scheduling process to the shares given to each VO, which are independently defined by the various grid stakeholders, and can cope with under-utilized shares. Using the State-Action-Reward-State-Action (SARSA) algorithm, the RL scheduler can quickly adapt its decisions to the non-stationary distributions of inter-arrival time, load, and QoS requirements featured by EGEE.
Conclusions and Future Work
Multi-objective RL provides a method to neatly combine heterogeneous goals, and discover policies that satisfy them. This work deals with QoS and fair-share, but green computing objectives could be integrated as well. Ongoing work includes multi-scale reinforcement learning to handle the difference of time scale of the objective functions, and hybrid methods where offline RL, online RL, and the native scheduler are exploited in order to cope with the transitory periods.