Research Article

Optimization of Markov jump linear system with controlled modes jump probabilities

  • Yankai XU ,
  • Xi CHEN
Expand
  • Tsinghua National Laboratory for Information Science and Technology, Center for Intelligent and Networked Systems, Department of Automation, Tsinghua University

Published date: 05 Mar 2009

Copyright

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg

Abstract

The optimal control of a Markov jump linear quadratic model with controlled jump probabilities of modes is investigated. Two kinds of mode control policies, i.e., open-loop control policy and closed-loop control policy, are considered. Using the concepts of policy iteration and performance potential, the sufficient condition needed for the optimal closed-loop control policy to perform better than the optimal open-loop control policy is proposed. The condition is helpful for the design of an optimal controller. Furthermore, an efficient algorithm to construct a closed-loop control policy, which is better than the optimal open-loop control policy, is given with policy iteration.

Cite this article

Yankai XU , Xi CHEN . Optimization of Markov jump linear system with controlled modes jump probabilities[J]. Frontiers of Electrical and Electronic Engineering, 2009 , 4(1) : 55 -59 . DOI: 10.1007/s11460-008-0076-5

Introduction

In recent years, switching systems have received great attention because of their potential applications in engineering systems [1]. The Markov jump linear system (MJLS) is a class of switching systems that has been fully studied. MJLS is widely applied in systems with abrupt changes in operating points or disturbances [2,3], including flexible manufacturing systems, power systems, economic systems, fault-tolerant systems and inventory systems[ 410].
This paper considers the discrete-time jump linear quadratic Gaussian (JLQG) model. The mode jump of a standard Markov jump system is governed by a Markov chain, whose transition probability matrix is given a priori. In practice, the jump of modes is random, but the jump probabilities can often be controlled (or selected from a finite set). For example, the probability of the jump from normal mode to fault mode of a machine depends on daily maintenance frequency; and the switch between received mode and lost mode of data packages in networked control systems depends on the strength of the communication signal. However, problems with controlled jump probabilities of modes are rarely studied [1113]. This paper discusses two classes of control policies for modes jump probabilities: open-loop mode control and closed-loop mode control. The relation between the two mode control policies and the standard JLQG model is analyzed. Generally, optimization of the open-loop mode control is an easier problem in a smaller policy space compared to optimization of the closed-loop mode control. By using the performance potential concept and policy iteration approach, this paper presents the sufficient condition under which the optimal closed-loop mode control is better than the optimal open-loop mode control. Based on this condition, we can easily construct a closed-loop mode control policy, which has better performance than the optimal open-loop mode control.

Formulation

The Markov jump system is a class of hybrid systems with two kinds of dynamics, i.e., the mode described by a discrete Markov chain, and the state described by the state-space equation under a mode. Consider the following discrete-time linear system
xk+1=A(θk)xk+B(θk)uk+σ(θk)wk,
where k is the discrete time epoch, BoldItalickRn is the state, θk={1,2,,S} is the mode, BoldItalickRm is the control variable, and wk is an i.i.d. random variable with mean zero and covariance matrix BoldItalic (identity matrix). BoldItalic(θk), BoldItalic(θk), BoldItalic(θk) are suitably-dimensional matrices depending on the mode. The mode θk satisfies the Markov chain whose jump probability matrix is p={pij}i,jS. Assume that the Markov chain is ergodic, and it has a steady-state probability row vector BoldItalic = [π1,π2,...,πS]. The criterion to be optimized is
J(x0,θ0)=limK1KE{k=0K-1[fu(θk,xk)]|x0,θ0},
where the cost function is
fu(θk,xk)=xkM(θk)xk+ukN(θk)uk.
Suppose that Eq. (1) is stochastically stabilizable. For a stable system, performance criterion Eq. (2) exists and is independent of the initial values [5]. For simplicity, when θk = i, the matrices BoldItalic(θk), BoldItalic(θk), BoldItalic(θk), BoldItalic(θk) and BoldItalic(θk) are written as BoldItalici, BoldItalici, BoldItalici, BoldItalici and BoldItalici, respectively. Let ai=σiσiT. Assume that BoldItalici and BoldItalici are positive semi-definite matrices.
If the jump probabilities {pij,jS} from mode i to other modes are not given a priori, but can be chosen from a finite set Yi, it can be concluded that the jump probabilities p={pij}i,jS of the modes are controllable. This paper studies the optimization problem of JLQG with controlled mode jump probabilities. At this point, the admissible control policy is a combination of the feedback control law u and the mode control policy, denoted as ={u,p}. Consider two classes of mode control polices: For the first class, the jump probability is independent of the current system state BoldItalic, i.e., the jump probability takes the same value for any state BoldItalic, thus we call it an open-loop mode control; For the second class, the jump probability depends on the current system state BoldItalic, i.e., the jump probability takes a different value for a different state BoldItalic, thus we call it a closed-loop mode control, and the corresponding jump probability is denoted as pij(BoldItalic). Consider the following optimization problems:
Problem 1 Find a state feedback control law u(i,BoldItalic) and a closed-loop mode control pij(BoldItalic) to minimize performance Eq. (2). Denote the admissible control policy of Problem 1 as L1={u(i,x),pij(x)}, where the optimal policy is denoted as 1*.
Problem 2 Find a state feedback control law u(i,BoldItalic) and an open-loop mode control pij to minimize performance Eq. (2). Denote the admissible control policy of Problem 2 as 2={u(i,x),pij}, where the optimal policy is denoted as 2*.
Problem 3 Given a mode jump probability matrix BoldItalic, find a state feedback control law u(i,BoldItalic) to minimize performance Eq. (2). Denote the admissible control policy of Problem 3 as 3(p)={u(i,x)}, where the optimal policy is denoted as 3*(p). Problem 3 is a typical JLQG problem. The following lemma gives its optimal solution.
Lemma 1 [5] Given a mode's jump probability p={pij}i,jS, the optimal feedback control law is
u*(i,x)=-Lix,
i=[Ni+BiTFiBi]-1BiTFiAi,
where Fi=jpijKj, BoldItalici is the unique solution to the coupled Riccati equation:
Ki=AiTFiAi+Mi-AiTFiBiLi.
Then the optimal performance is
J3*(p)=iπitr(aiFi).
For Problem 3, by applying the results in Ref. [5] and the approach in Ref. [13], it is easy to verify that the value function (also called performance potential [14]) of the optimal feedback control law u*(i,BoldItalic) is
gL3*(p)(i,x)=xTKix+qi.
where qi is the solution to the following equation:
(I-p)q+J3*(p)e-f ˜=0,
where BoldItalic is the identity matrix, BoldItalic = [q1,q2,...,qS]T, BoldItalic = [1,1,...,1]T, f ˜=[f ˜1,f ˜2,,f ˜S]T and f ˜i=tr(aiFi). The solution of Eq. (4) is not unique, i.e., ifqi,iS is a solution, then for any constant c, qi,iS is also a solution to Eq. (4). Reference [14] gives a special solution in the following form:
q=(I-p+eπ)-1f ˜.
If pij is controlled, {3*(p),pij} is an admissible control policy of Problem 2. Let
p*=argminp{J{3*(p),pij}},
then 2*={3*(p*),pij*} is the optimal admissible control policy of Problem 2, and the performance of 3*(p*) is the same as that of 2*, i.e., J2*=J3*(p*). By solving Problem 3, we can find the optimal open-loop mode control in the set qi,iS, and thus obtain the optimal policy 2* of Problem 2. Another way to address Problem 2 is to apply the gradient-based method introduced in Ref. [13] to find the optimal policy.
Solving Problem 1 is rather difficult. An admissible control policy of Problem 2 can be viewed as a special admissible control policy of Problem 1, thus the policy space of Problem 2 is a subset of the policy space of Problem 1. Therefore, the optimal performance of Problem 1 is no worse than the optimal performance of Problem 2.
Dynamic systems can be formulated with Markov systems. We define transition function Pi(B|BoldItalic) as the probability of the transition from mode i, state BoldItalic to Borel set B, and P(j,B|i,BoldItalic) as the probability of the transition from mode i, state BoldItalic to mode j, Borel set B. Then we have P(j,B|i,BoldItalic) = pijPi(B|BoldItalic). For any cost function f(i,BoldItalic), define a transition operation
Pf(i,x)=jSpijyRnPi(dy|x)f(j,y).
In the above definition, f(i,x) represents the expected cost function at the next time epoch after one step transition when the current mode is i and the state is BoldItalic. Let be a control policy which can be any admissible control policy of Problems 1, 2 and 3. Under this policy, the transition function, cost function and performance are denoted as P,f and J respectively. Reference [15 ] gives the policy iteration formula to optimize performance.
Lemma 2 [15] Policy ' is better than policy , if and only if
For any iS,xRn, we have
'g(i,x)+f'(i,x)g(i,x)+f(i,x);
There exists lS and a set XRn with non-zero measure, such that for all BoldItalicX, we have
'g(l,x)+f'(l,x)<g(l,x)+f(l,x),
g is the performance potential under policy , which satisfies the Poisson equation:
J+g=f+g.

Analysis of two classes of mode control policies

We shall discuss the condition under which the optimal admissible control policy of Problem 1 is better than that of Problem 2.
Lemma 3 If lS,XRn and p ¯ljYl, such that for l and BoldItalicX, we have
jp ¯ljyRnPl3*(p*)(dy|x)g2*(j,y) <jplj*yRnPl3*(p*)(dy|x)g2*(j,y),
then 1* is better than 2*.
Proof2* can be considered as a special admissible control policy of Problem 1:
˜1={3*(p*),p ˜ij(x)=pij*,x}.
Policies 2* and ˜1 have the same performance. We can construct an admissible control policy for Problem 1 as follows:
1={3*(p*),pij(x)},
where
pij(x)={p ¯ij, if i=l and xX, pij*, otherwise.
Note that the cost function fu(i,BoldItalic) only depends on the feedback control law, and is independent of mode control policy, thus
1g ˜1(i,x)+f1(i,x) =jSpij(x)yRnPi3*(p*)(dy|x)g2*(j,y)+f3*(p*)(i,x),
and
˜1g ˜1(i,x)+f ˜1(i,x) =jSpij*yRnPi3*(p*)(dy|x)g2*(j,y)+f3*(p*)(i,x).
According to Eqs. (7) and (8), for l and BoldItalicX, we have
1g ˜1(l,x)+f1(l,x)< ˜1g ˜1(l,x)+f ˜1(l,x).
For any other mode i and state BoldItalic, we have
1g ˜1(i,x)+f1(i,x)= ˜1g ˜1(i,x)+f ˜1(i,x).
According to Lemma 2, 1 is better than ˜1. And 1* is no worse than 1, thus 1* is better than ˜1. Since 2*= ˜1, 1* is better than 2*.
Lemma 4 If lS,XRn and p ¯ljYl, such that for l and BoldItalicX, we have
jp ¯lj[xT(Al-BlLl)TKj(Al-BlLl)x+tr(alKj)+qj] <jplj*[xT(Al-BlLl)TKj(Al-BlLl)x+tr(alKj)+qj].
Thus 1* is better than 2*. Ki,qi,Li,iS are solutions to Problem 3 with a given BoldItalic*.
Proof Put Eq. (3) into Eq. (7), and the proof can be obtained with Lemmas 3 and 4.
Let
Π(Al-BlLl)TjKj(p ¯lj-plj*)(Al-BlLl),
and
Γj(tr(alKj)+qj)(plj*-p ¯lj),
then Eq. (9) is equivalent to
xTΠx<Γ.
With the above lemmas, we obtain the sufficient condition under which the optimal closed-loop mode control is better than the optimal open-loop mode control.
Theorem 1 If lS,XRn and p ¯ljYl, such that one of the following three conditions is satisfied, then 1* is better than 2*.
Π is an indefinite matrix;
Π is a non-zero positive semi-definite matrix, and Γ > 0;
Π is a non-zero negative semi-definite matrix, and Γ < 0.
Ki,qi,Li,iS are solutions to Problem 3 when BoldItalic* is given.
Proof First, Eq. (10) does not constantly hold. Otherwise, from Lemma 2 we obtain that p ¯lj is better than plj*. This conclusion conflicts with the definition that BoldItalic* is the jump probability matrix corresponding to the optimal admissible control policy of Problem 2. Thus the solution to Eq. (10) is not Rn space. Therefore, the Lemma 4 condition is equivalent to that for the optimal admissible control policy of Problem 2, 2*, lS and p ¯ljYl, such that Eq. (10) has a solution.
The matrix Π may be found in four cases: non-zero positive semi-definite, non-zero negative semi-definite, zero and indefinite. We will discuss these four cases respectively.
When condition 1) holds, i.e., Π is indefinite, then Eq. (10) always has a solution no matter what Γ is.
When condition 2) holds, i.e., Π is non-zero positive semi-definite, then we have 0 ≤ BoldItalicTΠBoldItalic < +∞. Thus, Eq. (10) has a solution and does not hold constantly, which means Γ > 0.
When condition 3) holds, i.e., Π is non-zero negative semi-definite, then we have -∞ < BoldItalicTΠBoldItalic ≤ 0. If Γ ≥ 0, then from Lemma 2 we obtain that p ¯lj is better than plj*, which conflicts with the definition of BoldItalic*. Therefore, Γ < 0.
If Π is zero, when Γ > 0, the solution of Eq. (10) is Rn. And when Γ ≤ 0, Eq. (10) has no solution. Therefore, Π cannot be zero. Theorem 1 is proved.
Theorem 1 gives the sufficient condition under which the optimal closed-loop mode control is better than the optimal open-loop mode control. If this condition is satisfied, the closed-loop mode control achieves better performance than the open-loop mode control. This result is helpful for designing controllers. For the optimization problem of JLQG with controlled jump probabilities, seeking the optimal open-loop mode control is relatively simpler with a smaller policy space. However, seeking the optimal closed-loop mode control is rather difficult, since if the jump of modes depends on system state, the state feedback control law then cannot be solved by the standard JLQG model any more. The only existing methods are dynamic programming or simulation-based policy iteration approaches [15] which are time consuming. If the optimal open-loop mode control policy is obtained, we can apply the sufficient condition provided by Theorem 1 to check if there is any better closed-loop mode control policy. Thereafter, we can improve the jump probability of modes.
For a special case with two modes and a one-dimensional state, Theorem 1 can be simplified.
Corollary 1m = n = 1 and the mode set is S={1,2}. If lS, p ¯ljYl, p ¯ljplj*, such that
(σl2K2+q2)-(σl2K1+q1)(Al-BlLl)2(K1-K2)>0,
then 1* is better than 2*. Ki,qi,Li,iS are the solutions to Problem 3 when BoldItalic* is given.
Proof In this case,
Π=(Al-BlLl)2(p ¯l1-pl1*)(K1-K2),
Γ=(pl1*-p ¯l1)[(σl2K1+q1)-(σl2K2+q2)].
According to Theorem 1, it is easy to obtain the proof for Corollary 1.
Furthermore, we can solve the interval X. Solve the equation Πx2 = Γ, we have
x+=(σl2K2+q2)-(σl2K1+q1)(Al-BlLl)2(K1-K2),
x-=-(σl2K2+q2)-(σl2K1+q1)(Al-BlLl)2(K1-K2).
If Eq. (11) holds, x+ and x- exist. If (p ¯l1-pl1*)(K1-K2)>0, we have X = [x-,x+]. If (p ¯l1-pl1*)(K1-K2)<0, the interval becomes X = (-∞,x-] ∪ [x+,+∞).
If the condition in Theorem 1 is satisfied, we should seek an admissible control policy of Problem 1 to improve system performance. Seeking 1* is rather difficult. As a compromise, we tried to find a good admissible control policy for Problem 1, which improves system performance significantly compared with 2*. All the proofs of the above lemmas and theorems are all constructive, thus we can construct a closed-loop mode control following the same idea. Combining the original feedback control law and the closed-loop mode control, we obtain a new policy, which is better than 2* and BoldItalic*.
Algorithm 1 Construct an admissible control policy for Problem 1.
Solve the optimal admissible control policy of Problem 2, 2*.
Check the condition contained in Theorem 1. When it is satisfied, there may exist more than one mode lS, and even in one Yl there may exist several possible p ¯lj's. Then choose any mode lS, which satisfies the condition and corresponding XRn and p ¯ljYl.
Implement a one-step policy iteration for 2*. Construct 1={3*(p*),pij(x)} by using Eq. (8).

Numerical example

Example 1 Consider a system with a one-dimensional state and two modes, S={1,2}.
A1=0.5;B1=2;M1=1;N1=10;σ1=2; A2=1;B2=1;M2=2;N2=10;σ2=1;Y1={[0.8 0.2],[0.2 0.8]},Y2={[0.5 0.5]}.
First, find the optimal open-loop mode control. Therefore, we have
p*=[0.80.20.50.5].
Then K1 = 1.236, K2 = 2.614, q1 = 6.549, q2 = 0.664, L1 = 0.094, L2 = 0.161. When l = 1, we have
[(σ12K2+q2)-(σ12K1+q1)]/[(A1-B1L1)2(K1-K2)] =2.772>0,
thus the optimal admissible control policy of Problem 1, 1*, is better than 2*. According to Eq. (12), we have
X=[-1.665,1.665].
Using Algorithm 1, we can construct a better closed-loop mode control: When xX,
p(x)=[0.20.80.50.5];
otherwise, BoldItalic′(x) = BoldItalic*. Through simulation we have that the system performance under policy 2* is 4.87, while the performance under constructed policy 1 is 4.06, which achieves a 17% improvement than 2*.

Conclusions

The paper considers two classes of mode control policies, and gives the sufficient condition under which the optimal closed-loop mode control is better than the optimal open-loop mode control. When optimizing the JLQG model with controlled jump probabilities of modes, we can first find the optimal feedback control law and then the optimal open-loop mode control, after which we can check the condition that is sufficient. If it is satisfied, we can further seek better closed-loop mode control policies; otherwise, the open-loop mode control is good enough, and we can stop searching to avoid huge computations. If the optimal closed-loop mode control is difficult to obtain, we can apply Algorithm 1 to construct a closed-loop mode control, which significantly improves system performance. The numerical example verifies the efficiency of the proposed algorithm.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 60574064, 60736027).
1
ChengD Z, GuoY Q. Advances on switched systems. Control Theory and Applications, 2005, 22(6): 954–960(in Chinese)

2
Abou-kandilH, De SmetO, FreilingG, . Flow control in a failure-prone multi-machine manufacturing system. In: Proceedings of INRIA/IEEE Symposium on Emerging Technologies and Factory Automation. 1995, 2: 575–583

DOI

3
BoukasE K, ShiP, AndijaniA. Robust inventory-production control problem with stochastic demand. Optimal Control Applications and Methods, 1999, 20(1): 1–20

DOI

4
JiY, ChizeckH J. Controllability, stabilizability, and continuous-time Markovian jump linear quadratic control. IEEE Transactions on Automatic Control, 1990, 35(7): 777–788

DOI

5
CostaO, FragosoM D, MarquesR P. Discrete-Time Markov Jump Linear Systems. London: Springer-Verlag, 2005

6
XueF, GuoL. Necessary and sufficient conditions for adaptive stabilizability of jump linear systems. Communications in Information and Systems, 2001, 1(2): 205–224

7
ZhangL J, LiC W, ChengD Z. Robust adaptive control of Markov jump systems with parameter uncertainties. Control and Decision, 2005, 20(9): 1030–1033(in Chinese)

8
LiuF. Robust L2-L filtering for uncertain jump systems. Control and Decision, 2005, 20(1): 32–35(in Chinese)

9
LiuF, ZhangX H. Robust control for jump systems with L2 gain constraints. Control Theory and Applications, 2006, 23(3): 373–377(in Chinese)

10
LiuF, SuH Y, ChuJ. Robust positive real control of Markov jump systems with parametric uncertainties. Acta Automatica Sinica, 2003, 29(5): 761–766(in Chinese)

11
JiY, ChizeckH J. Optimal quadratic control of jump linear systems with separately controlled transition probabilities. International Journal of Control, 1989, 49(2): 481–491

12
BoukasE K, LiuZ K. Jump linear quadratic regulator with controlled jump rates. IEEE Transactions on Automatic Control, 2001, 46(2): 301–305

DOI

13
XuY K, ChenX. Discrete-time JLQG with dependently controlled jump probabilities. In: Proceedings of IEEE 22nd International Symposium on Intelligent Control, Singapore. 2007: 441–445

14
CaoX R. Stochastic Learning and Optimization: A Sensitivity-Based Approach. New York: Springer, 2007

15
ZhangK J, XuY K, ChenX, . Policy iteration based feedback control. Automatica, 2008, 44(4): 1055–1061

DOI

Options
Outlines

/