A parallel scheduling algorithm for reinforcement learning in large state space

Quan LIU; Xudong YANG; Ling JING; Jin LI; Jiao LI

doi:10.1007/s11704-012-1098-y

PDF(659 KB)

Front. Comput. Sci. ›› 2012, Vol. 6 ›› Issue (6) : 631-646. DOI: 10.1007/s11704-012-1098-y

RESEARCH ARTICLE

A parallel scheduling algorithm for reinforcement learning in large state space

Quan LIU¹^,³ ,
Xudong YANG¹ ,
Ling JING² ,
Jin LI¹ ,
Jiao LI¹

Author information +

History +

Abstract

The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To address the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability.

Keywords

divide-and-conquer strategy / parallel schedule / scalability / large state space / continuous state space

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Quan LIU, Xudong YANG, Ling JING, Jin LI, Jiao LI. A parallel scheduling algorithm for reinforcement learning in large state space. Front Comput Sci, 2012, 6(6): 631‒646 https://doi.org/10.1007/s11704-012-1098-y

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Zhao Z, Liu H. Searching for interacting features in subset selection. Intelligent Data Analysis, 2009, 13(2): 207-228

[2]	Singh S P, Jaakola T, Jordan M I. Neural Information Processing Systems. Massachusetts: MIT Press, 1995, 361-368

[3]	LaTorre A, Peña J M, Muelas S, Freitas A. Learning hybridization strategies in evolutionary algorithms. Intelligent Data Analysis, 2010, 14(3): 333-354

[4]	Akiyama T, Hachiya H, Sugiyama M. Efficient exploration through active learning for value function approximation in reinforcement learning. Neural Networks, 2010, 23(5): 639-648 CrossRef Google scholar

[5]	Langlois M, Sloan R H. Reinforcement learning via approximation of the Q-function. Journal of Experimental and Theoretical Artificial Intelligence, 2010, 22(3): 219-235 CrossRef Google scholar

[6]	Yao M H, Qu X Y, Li J H, Gu Q L, Tang L P. Study on Q-learning algorithm based on ART2. Chinese Control and Decision, 2011, 26(2): 227-232

[7]	Zaragoza J H, Morales E F. Relational reinforcement learning with continuous actions by combining behavioural cloning and locally weighted regression. Journal of Intelligent Learning Systems and Applications, 2010, 2(2): 69-79 CrossRef Google scholar

[8]	Mohan S, Laird J E. Relational reinforcement learning in infinite Mario. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 1953-1954

[9]	Wang W X, Xiao S D, Meng X Y, Chen Y S, Zhang W H. Model and architecture of hierarchical reinforcement learning based on agent. Chinese Journal of Mechanical Engineering, 2010, 46(2): 76-82 (in Chinese) CrossRef Google scholar

[10]	Kozlova O, Sigaud O, Meyer C. TeXDYNA: hierarchical reinforcement learning in factored MDPs. In: Proceedings of the 11th International Conference on Simulation of Adaptive Behavior: from Animals to Animats. 2010, 489-500

[11]	Yang Q. Formalizing planning knowledge for hierarchical planning. Computational Intelligence, 1990, 6(1): 12-24 CrossRef Google scholar

[12]	Knoblock C A, Tenenberg J D, Yang Q. Characterizing abstraction hierarchies for planning. In: Proceedings of the 9th National Conference on Artificial Intelligence. 1991, 692-697

[13]	Smith D R. Applications of a strategy for designing divide-and-conquer algorithms. Science of Computer Programming, 1987, 8(3): 213-229 CrossRef Google scholar

[14]	Tong L, Lu J L, Gong J W. Research on fast reinforcement learning. Journal of Beijing Institute of Technology, 2005, 25(4): 328-331 (in Chinese)

[15]	Wang H Y. Novel heuristic Q-learning algorithm. Chinese Computer Engineering, 2009, 35(22): 173-175 (in Chinese)

[16]	Song Q K, Hu Z Y. Q-Learning based on the experience knowledge. Chinese Control Theory and Applications, 2006, 25(11): 10-12 (in Chinese)

[17]	Kretchmar R M. Parallel reinforcement learning. In: Proceedings of the 6th World Conference on Systemics, Cybernetics, and Informatics. 2002: 114-118

[18]	Wingate D, Seppi K D. P3VI: a partitioned, prioritized, parallel value iterator. In: Proceedings of the 21st International Conference on Machine Learning. 2004: 109-116

[19]	Meng W, Han X D. Parallel reinforcement learning algorithm and its application. Chinese Computer Engineering and Applications, 2009, 45(34): 25-28 (in Chinese)

[20]	Kaya M, Arslan A. Parallel and distributed multiagent reinforcement learning. In: Proceedings of the 8th International Conference on Parallel and Distributed Systems. 2001: 437-441

[21]	Zhang C J, Lesser V. Coordinated multi-agent reinforcement learning in networked distributed POMDPs. In: Proceedings of the Twenty- Fifth AAAI Conference on Artificial Intelligence. 2011, 764-770

[22]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998

[23]	Tsitsiklis J N. Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994, 16(3): 185-202 CrossRef Google scholar

[24]	Kretchmar R M. Reinforcement learning algorithms for homogenous multi-agent systems. In: Workshop on Agent and Swarm Programming. 2003

[25]	Printista A M, Errecalde M L,Montoya C I. A parallel implementation of Q-learning based on communication with cache. Journal of Computer Science & Technology, 2002, 1(6)

[26]	Grounds M, Kudenko D. Parallel Reinforcement Learning with Linear Function Approximation. Berlin Heidelberg: Springer-Verlag, 2008, 60-74

[27]	Kushida M, Takahashi K, Ueda H, Miyahara T. A comparative study of parallel reinforcement learning methods with a PC cluster system. In: Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology. 2006, 416-419 CrossRef Google scholar

[28]	Mann T A, Choe Y. Scaling up reinforcement learning through targeted exploration. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence. 2011, 435-440

[29]	Tateyama T, Kawata S, Shimomura Y. Parallel reinforcement learning systems using exploration agents and dyna-Q algorithm. In: Proceedings of International Conference on Instrumentation, Control, Information Technology and System Integration. 2007, 2774-2778