First passage Markov decision processes with constraints and varying discount factors

Xiao WU; Xiaolong ZOU; Xianping GUO

doi:10.1007/s11464-015-0479-6

Front. Math. China ›› 2015, Vol. 10 ›› Issue (4) : 1005 -1023. DOI: 10.1007/s11464-015-0479-6

RESEARCH ARTICLE

First passage Markov decision processes with constraints and varying discount factors

Author information +

History +

PDF (175KB)

Abstract

This paper focuses on the constrained optimality problem (COP) of first passage discrete-time Markov decision processes (DTMDPs) in denumerable state and compact Borel action spaces with multi-constraints, state-dependent discount factors, and possibly unbounded costs. By means of the properties of a so-called occupation measure of a policy, we show that the constrained optimality problem is equivalent to an (infinite-dimensional) linear programming on the set of occupation measures with some constraints, and thus prove the existence of an optimal policy under suitable conditions. Furthermore, using the equivalence between the constrained optimality problem and the linear programming, we obtain an exact form of an optimal policy for the case of finite states and actions. Finally, as an example, a controlled queueing system is given to illustrate our results.

Keywords

Discrete-time Markov decision process (DTMDP) / constrained optimality / varying discount factor / unbounded cost

Cite this article

Download citation ▾

Xiao WU, Xiaolong ZOU, Xianping GUO. First passage Markov decision processes with constraints and varying discount factors. Front. Math. China, 2015, 10(4): 1005-1023 DOI:10.1007/s11464-015-0479-6

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Altman E. Denumerable constrained Markov decision processes and finite approximations. Math Methods Oper Res, 1994, 19: 169-191

[2]	Altman E. Constrained Markov Decision Processes. Boca Raton: Chapman & Hall/CRC, 1999

[3]	Alvarez-Mena J, Hernández-Lerma O. Convergence of the optimal values of constrained Markov control processes. Math Methods Oper Res, 2002, 55: 461-484

[4]	Borkar V. A convex analytic approach to Markov decision processes. Probab Theory Related Fields, 1988, 78: 583-602

[5]	Derman C. Finite State Markovian Decision Processes. Mathematics in Science and Engineering, Vol 67. New York: Academic Press, 1970

[6]	Dufour F, Prieto-Rumeau T. Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J Control Optim, 2013, 51: 1298-1324

[7]	González-Hernández J, Hernández-Lerma O. Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim, 2005, 15: 1085-1104

[8]	Guo X P, Hernández-del-Valle A, Hernández-Lerma O. First passage problems for nonstationary discrete-time stochastic control systems. Eur J Control, 2012, 18: 528-538

[9]	Guo X P, Hernández-Lerma O. Continuous-Time Markov Decision Processes: Theory and Applications. Berlin: Springer-Verlag, 2009

[10]	Guo X P, Piunovskiy A. Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res, 2011, 36(1): 105-132

[11]	Guo X P, Song X Y, Zhang Y. First passage criteria for continuous-time Markov decision processes with varying discount factors and history-dependent policies. IEEE Trans Automat Control, 2014, 59: 163-174

[12]	Guo X P, Zhang W Z. Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Eur J Control, 2014, 238: 486-496

[13]	Hernández-Lerma O, González-Hernández J. Constrained Markov decision processes in Borel spaces: the discounted case. Math Methods Oper Res, 2000, 52: 271-285

[14]	Hernández-Lerma O, Lasserre J B. Discrete-Time Markov Control Processes. Basic Optimality Criteria. New York: Springer-Verlag, 1996

[15]	Hernández-Lerma O, Lasserre J B. Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1999

[16]	Huang Y H, Wei Q D, Guo X P. Constrained Markov decision processes with first passage criteria. Ann Oper Res, 2013, 206: 197-219

[17]	Mao X, Piunovskiy A. Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl, 2000, 18: 755-776

[18]	Piunovskiy A. Optimal Control of Random Sequences in Problems with Constraints. Dordrecht: Kluwer Academic, 1997

[19]	Piunovskiy A. Controlled random sequences: the convex analytic approach and constrained problems. Russian Math Surveys, 2000, 53: 1233-1293

[20]	Prokhorov Y. Convergence of random processes and limit theorems in probability theory. Theory Probab Appl, 1956, 1: 157-214

[21]	Saldi N, Linder T, Yüksel S. Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans Automat Control, 2015, 60 (to appear)

[22]	Sennott L I. Stochastic Dynamic Programming and the Control of Queueing Systems. New York: Wiley, 1999

[23]	Wei Q D, Guo X P. Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett, 2011, 39: 369-374

[24]	Wu X, Guo X P. First passage optimality and variance minimization of Markov decision processes with varying discount factors. J Appl Probab, 2015, 52(2) (to appear)