Reinforcement learning models for scheduling in wireless networks
Kok-Lim Alvin YAU, Kae Hsiang KWONG, Chong SHEN
Reinforcement learning models for scheduling in wireless networks
The dynamicity of available resources and network conditions, such as channel capacity and traffic characteristics, have posed major challenges to scheduling in wireless networks. Reinforcement learning (RL) enables wireless nodes to observe their respective operating environment, learn, and make optimal or near-optimal scheduling decisions. Learning, which is the main intrinsic characteristic of RL, enables wireless nodes to adapt to most forms of dynamicity in the operating environment as time goes by. This paper presents an extensive review on the application of the traditional and enhanced RL approaches to various types of scheduling schemes, namely packet, sleep-wake and task schedulers, in wireless networks, as well as the advantages and performance enhancements brought about by RL. Additionally, it presents how various challenges associated with scheduling schemes have been approached using RL. Finally, we discuss various open issues related to RL-based scheduling schemes in wireless networks in order to explore new research directions in this area. Discussions in this paper are presented in a tutorial manner in order to establish a foundation for further research in this field.
reinforcement learning / scheduling / wireless networks
[1] |
Sutton R S, Barto A G. Reinforcement learning: an introduction. US: MIT Press, 1998
|
[2] |
Stidham S J. Applied probability in operations research: a retrospective//Preprint: analysis, design, and control of queueing systems. Operation Research, 2002, 50(1): 197-216
CrossRef
Google scholar
|
[3] |
Thompson M S, Mackenzie A B, Dasilva L A, Hadjichristofi G. A mobile ad hoc networking competition: a retrospective look at the MANIAC challenge. IEEE Communications Magazine, 2012, 50(7): 121-127
CrossRef
Google scholar
|
[4] |
Li X, Falcon R, Nayak A, . Stojmenovic I. Servicing wireless sensor networks by mobile robots. IEEE Communications Magazine, 2012, 50(7): 147-154
CrossRef
Google scholar
|
[5] |
Xue Y, Lin Y, Cai H, Chi C. Autonomic joint session scheduling strategies for heterogeneous wireless networks. In: Proceedings of the 2008 IEEE Wireless Communications and Networking Conference. 2008, 2045-2050
CrossRef
Google scholar
|
[6] |
Song M, Xin C, Zhao Y, Cheng X. Dynamic spectrum access: from cognitive radio to network radio. IEEE Wireless Communications, 2012, 19(1): 23-29
CrossRef
Google scholar
|
[7] |
Mao J, Xiang F, Lai H. RL-based superframe order adaptation algorithm for IEEE 802.15.4 networks. In: Proceedings of the 2009 Chinese Control and Decision Conference. 2009, 4708-4711
|
[8] |
Shah K, Kumar M. Distributed independent reinforcement learning (DIRL) approach to resource management in wireless sensor networks. In: Proceedings of the 4th International Conference on Mobile Ad-hoc and Sensor Systems. 2007, 1-9
|
[9] |
Niu J. Self-learning scheduling approach for wireless sensor network. In: Proceedings of the 2010 International Conference on Future Computer and Communication. 2010, 253-257
CrossRef
Google scholar
|
[10] |
Kaelbling L P, Littman M L, Wang X. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 1996, 4: 237-285
|
[11] |
Bourenane M. Adaptive scheduling in mobile ad hoc networks using reinforcement learning approach. In: Proceedings of the 9th International Conference on Innovations in Information Technology. 2011, 392-397
|
[12] |
Felice M D, Chowdhury K R, Kassler A, Bononi L. Adaptive sensing scheduling and spectrum selection in cognitive wireless mesh networks. In: Proceedings of the 2011 International Conference on Computer Communication Networks. 2011, 1-6
|
[13] |
Zouaidi S, Mellouk A, Bourennane M, Hoceini S. Design and performance analysis of inductive QoS scheduling for dynamic network routing. In: Proceedings of the 20th Conference on Software, Telecomm, Computer Networks. 2008, 140-146
|
[14] |
Sallent O, Pérez-Romero J, Sánchez-González J, Agustí R, Díazguerra MA, Henche D, Paul D. A roadmap from UMTS optimization to LTE self-optimization. IEEE Communications Magazine, 2011, 49(6): 172-182
CrossRef
Google scholar
|
[15] |
Bobarshad H, van der Schaar M, Aghvami A H, Dilmaghani R S, Shikh-Bahaei M RAnalytical modeling for delay-sensitive video over WLAN. IEEE Transactions on Multimedia, 2012, 14(2): 401-414
CrossRef
Google scholar
|
[16] |
Liu Z, Elhanany I. RL-MAC: a QoS-aware reinforcement learning based MAC protocol for wireless sensor networks. In: Proceedings of the 2006 Conference on Networking, Sensing and Control. 2006, 768-773
|
[17] |
Yu R, Sun Z, Mei S. Packet scheduling in broadband wireless networks using neuro-dynamic programming. In: Proceedings of the 65th IEEE Vehicular Technology Conference. 2007, 2276-2780
|
[18] |
Khan M I, Rinner B. Resource coordination in wireless sensor net works by cooperative reinforcement learning. In: Proceedings of the 2012 IEEE International Conference on Pervasive Computing and Communications. 2012, 895-900
|
[19] |
Kok J R, Vlassis N. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 2006, 7: 1789-1828
|
[20] |
Schneider J, Wong W-K, Moore A, Riedmiller M, Distributed value functions. In: Proceedings of the 16th Conference on Machine Learning. 1999, 371-378
|
[21] |
Sahoo A, Manjunath D. Revisiting WFQ: minimum packet lengths tighten delay and fairness bounds. IEEE Communications Letters, 2007, 11(4): 366-368
|
[22] |
Yu H, Ding L, Liu N, Pan Z, Wu P, You X. Enhanced first-in-first-outbased round-robin multicast scheduling algorithm for input-queued switches. IET Communications, 2011, 5(8): 1163-1171
CrossRef
Google scholar
|
[23] |
Yau K L A, Komisarczuk P, Teal P D. Enhancing network performance in distributed cognitive radio networks using single-agent and multi-agent reinforcement learning. In: Proceedings of the 2010 Conference on Local Computer Networks. 2010, 152-159
|
[24] |
. Engineering Systems Division (ESD). ESD Symposium Committee Overview. In: Proceedings of Massachusetts Institute of Technology ESD Internal Symposium. 2002. http://esd.mit.edu/WPS
|
[25] |
Ouzecki D, Jevtic D. Reinforcement learning as adaptive network routing of mobile agents. In: Proceedings of the 33rd International Conference on Information and Communication Technology. Electronics and Microelectronics. 2010, 479-484
|
[26] |
Bhorkar A A, Naghshvar M, Javidi T, Rao B D. Adaptive opportunistic routing for wireless ad hoc networks. IEEE/ACM Transactions on Network, 2012, 20(1): 243-256
CrossRef
Google scholar
|
[27] |
Lin Z, Schaar M V DAutonomic and distributed joint routing and power control for delay-sensitive applications in multi-hop wireless networks. IEEE Transactions on Wireless Communications, 2011, 10(1): 102-113
CrossRef
Google scholar
|
[28] |
Santhi G, Nachiappan A, Ibrahime M Z, Raghunadhane R, Favas M K. Q-learning based adaptive QoS routing protocol for MANETs. In: Proceedings of the 2011 International Conference on Recent Trends in Information Technology. 2011, 1233-1238
CrossRef
Google scholar
|
/
〈 | 〉 |