RESEARCH ARTICLE

An average-value-at-risk criterion for Markov decision processes with unbounded costs

  • Qiuli LIU 1 ,
  • Wai-Ki CHING 2 ,
  • Junyu ZHANG , 3 ,
  • Hongchu WANG 1
Expand
  • 1. School of Mathematical Sciences, South China Normal University, Guangzhou 510631, China
  • 2. Advanced Modeling and Applied Computing Laboratory, Department of Mathematics, The University of Hong Kong, Hong Kong, China
  • 3. School of Mathematics, Sun Yat-Sen University, Guangzhou 510275, China

Received date: 03 Nov 2020

Accepted date: 24 May 2021

Copyright

2022 Higher Education Press

Abstract

We study the Markov decision processes under the average-valueat-risk criterion. The state space and the action space are Borel spaces, the costs are admitted to be unbounded from above, and the discount factors are state-action dependent. Under suitable conditions, we establish the existence of optimal deterministic stationary policies. Furthermore, we apply our main results to a cash-balance model.

Cite this article

Qiuli LIU , Wai-Ki CHING , Junyu ZHANG , Hongchu WANG . An average-value-at-risk criterion for Markov decision processes with unbounded costs[J]. Frontiers of Mathematics in China, 2022 , 17(4) : 673 -687 . DOI: 10.1007/s11464-021-0944-3

1
Andersson F , Mausser H , Rosen D , Uryasev S . Credit risk optimization with conditional value-at-risk criterion. Math Program, 2001, 89: 273- 291

DOI

2
Bäuerle N , Ott J . Markov decision processes with average-value-at-risk criteria. Math Methods Oper Res, 2011, 74: 361- 379

DOI

3
Bäuerle N , Rieder U . Markov Decision Processes with Applications to Finance. Universitext. Heidelberg: Springer, 2011

DOI

4
Boda K , Filar J A . Time consistent dynamic risk measures. Math Methods Oper Res, 2006, 63: 169- 186

DOI

5
Chu S Y , Zhang Y . Markov decision processes with iterated coherent risk measures. Internat J Control, 2014, 87: 2286- 2293

DOI

6
Guo X P . Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math Oper Res, 2007, 32: 73- 87

DOI

7
Guo X P , Hernández-del-Valle A , Hernández-Lerma O . Nonstationary discrete-time deterministic and stochastic control systems: bounded and unbounded cases. Systems Control Lett, 2011, 60: 503- 509

DOI

8
Hernández-Lerma O , Lasserre J B . Further Topics on Discrete-Time Markov Control Processes. New York: Springer-Verlag, 1999

DOI

9
Huang Y H , Guo X P . Minimum average value-at-risk for finite horizon semi-Markov decision processes in continuous time. SIAM J Optim, 2016, 26: 1- 28

DOI

10
Minjárez-Sosa J A . Markov control models with unknown random state-actiondependent discount factors. TOP, 2015, 23 (3): 743- 772

DOI

11
Puterman M L . Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: John Wiley & Sons Inc, 1994

DOI

12
Rockafellar R T , Uryasev S . Optimization of conditional value-at-risk. J Risk, 2000, 2: 21- 41

DOI

13
Rockafellar R T , Uryasev S . Conditional value-at-risk for general loss distributions. J Bank Finance, 2000, 26: 1443- 1471

14
Uğurlu K . Controlled Markov decision processes with AVaR criteria for unbounded costs. J Comput Appl Math, 2017, 319: 24- 37

DOI

15
Wei Q D , Guo X P . Markov decision processes with state-dependent discount factors and unbounded reward/costs. Oper Res Lett, 2011, 39 (5): 369- 374

DOI

16
Xia L . Optimization of Markov decision processes under the variance criterion. Automatica J IFAC, 2016, 73: 269- 278

DOI

17
Xia L . Variance minimization of parameterized Markov decision processes. Discrete Event Dyn Syst, 2018, 28: 63- 81

DOI

18
Xia L . Risk-sensitive Markov decision processes with combined metrics of mean and variance. Prod Oper Manag, 2020, 29 (12): 2808- 2827

DOI

Outlines

/