Reasoning and predicting POMDP planning complexity via covering numbers
Zongzhang ZHANG, Qiming FU, Xiaofang ZHANG, Quan LIU
Reasoning and predicting POMDP planning complexity via covering numbers
Partially observable Markov decision processes (POMDPs) provide a rich mathematical framework for planning tasks in partially observable stochastic environments. The notion of the covering number, a metric of capturing the search space size of a POMDP planning problem, has been proposed as a complexity measure of approximate POMDP planning. Existing theoretical results are based on POMDPs with finite and discrete state spaces and measured in the l1-metric space. When considering heuristics, they are assumed to be always admissible. This paper extends the theoretical results on the covering numbers of different search spaces, including the newly defined space reachable under inadmissible heuristics, to the ln-metric spaces. We provide a simple but scalable algorithm for estimating covering numbers. Experimentally, we provide estimated covering numbers of the search spaces reachable by following different policies on several benchmark problems, and analyze their abilities to predict the runtime of POMDP planning algorithms.
POMDP planning / complexity theory / covering number / search space
[1] |
Kaelbling L, Littman M, Cassandra A. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 1998, 101(1–2): 99–134
CrossRef
Google scholar
|
[2] |
Madani O, Hanks S, Condon A. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Proceedings of the 16th National Conference on Artificial Intelligence (AAAI). 1999, 541–548
|
[3] |
Smith T, Simmons R. Point-based POMDP algorithms: improved analysis and implementation. In: Proceedings of International Conference on Uncertainty in Artificial Intelligence (UAI). 2005, 542–547
|
[4] |
Pineau J, Gordon G, Thrun S. Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 2006, 27: 335–380
|
[5] |
Kurniawati H, Hsu D, Lee W S. SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Proceedings of Robotics: Science and Systems (RSS). 2008
CrossRef
Google scholar
|
[6] |
Zhang Z, Chen X. Accelerating point-based POMDP algorithms via greedy strategies. In: Proceedings of the 2nd International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR). 2010, 545–556
CrossRef
Google scholar
|
[7] |
Shani G, Pineau J, Kaplow R. A survey of point-based POMDP solvers. Journal of Autonomous Agents andMulti-Agent Systems, 2013, 27(1): 1–51
|
[8] |
Zhang Z, Hsu D, Lee W S. Covering number for efficient heuristic based POMDP planning. In: Proceedings of the 31st International Conference on Machine Learning (ICML). 2014, 26–34
|
[9] |
Zhang Z, Hsu D, Lee W S, Lim Z W, Bai A. PLEASE: plam leaf search for POMDPs with large observation spaces. In: Proceedings of the 8th Symposium on Combinatorial Search. 2015, 249–257
|
[10] |
Hsu D, Lee W S, Rong N. What makes some POMDP problems easy to approximate. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2007, 689–696
|
[11] |
Zhang Z, Littman M, Chen X. Covering number as a complexity measure for POMDP planning and learning. In: Proceedings of the 26th National Conference on Artificial Intelligence (AAAI). 2012, 1853–1859
|
[12] |
Littman M, Cassandra A, Kaelbling L. Learning policies for partially observable environments: scaling up. In: Proceedings of the 12th International Conference on Machine Learning (ICML). 1995, 362–370
CrossRef
Google scholar
|
[13] |
Hauskrecht M. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 2000, 13: 33–94
|
[14] |
Ross S, Pineau J, Paquet S, Chaib-Draa B. Online planning algorithms for POMDPs. Journal of Artificial Intelligence Research, 2008, 32: 663–704
|
[15] |
Bellman R. Dynamic Programming. Princeton, NJ,USA: Princeton University Press, 1957
|
[16] |
Smith T. Probabilistic planning for robotic exploration. Dissertation for the Doctoral Degree. Pittsburgh: Carnegie Mellon University, 2007
|
[17] |
Kakade S, Kearns M, Langford J. Exploration in metric state spaces. In: Proceedings of International Conference on Machine Learning (ICML). 2003, 306–312
|
[18] |
Cassandra A, Littman M, Zhang N L. Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes. In: Proceedings of the 14th National Conference on Artificial Intelligence (AAAI). 1997, 54–61
|
[19] |
Bonet B, Geffner H. Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2009, 1641–1646
|
[20] |
Ng A, Jordan M. PEGASUS: a policy search method for large MDPs and POMDPs. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI). 2000, 406–415
|
[21] |
Zhang Z, Chen X. FHHOP: a factored hybrid heuristic online planning algorithm for large POMDPs. 2012, arXiv preprint arXiv: 1210.4912
|
[22] |
Somani A, Ye N, Hsu D, Lee W S. DESPOT: online POMDP planning with regularization. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2013, 1772–1780
|
[23] |
Pineau J, Gordon G, Thrun S. Point-based value iteration: an anytime algorithm for POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2003, 1025–1032
|
[24] |
Poupart P, Kim K E, Kim D. Closing the gap: improved bounds on optimal POMDP solutions. In: Proceedings of International Conference on Automated Planning and Scheduling (ICAPS). 2011, 194–201
|
[25] |
Silver D, Veness J. Monte-Carlo planning in large POMDPs. In: Proceedings of Advances in Neural Information Processing Systems (NIPS). 2010, 2164–2172
|
[26] |
Bai A, Wu F, Zhang Z, Chen X. Thompson sampling based Monte-Carlo planning in POMDPs. In: Proceedings of International Conference on Automated Planning and Scheduling (ICAPS). 2014, 28–36
|
[27] |
Ong S C, Png S W, Hsu D, Lee W S. Planning under uncertainty for robotic tasks with mixed observability. International Journal of Robotics Research, 2010, 29(8): 1053–1068
CrossRef
Google scholar
|
[28] |
Seuken S, Zilberstein S. Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multi-Agent Systems, 2008, 17(2): 190–250
CrossRef
Google scholar
|
[29] |
Oliehoek F. Sufficient plan-time statistics for decentralized POMDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2013, 302–308
|
[30] |
Dibangoye J, Amato C, Buffet O, Charpillet F. Optimally solving Dec- POMDPs as continuous-state MDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2013, 90–96
|
/
〈 | 〉 |