Optimization of Markov jump linear system with controlled modes jump probabilities

Yankai XU; Xi CHEN

doi:10.1007/s11460-008-0076-5

Frontiers of Electrical and Electronic Engineering >

2009 , Vol. 4 >Issue 1: 55 - 59

DOI: https://doi.org/10.1007/s11460-008-0076-5

Research Article

Optimization of Markov jump linear system with controlled modes jump probabilities

Yankai XU ,
Xi CHEN

Expand

Tsinghua National Laboratory for Information Science and Technology, Center for Intelligent and Networked Systems, Department of Automation, Tsinghua University

Published date: 05 Mar 2009

Copyright

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg

Fold

Abstract

The optimal control of a Markov jump linear quadratic model with controlled jump probabilities of modes is investigated. Two kinds of mode control policies, i.e., open-loop control policy and closed-loop control policy, are considered. Using the concepts of policy iteration and performance potential, the sufficient condition needed for the optimal closed-loop control policy to perform better than the optimal open-loop control policy is proposed. The condition is helpful for the design of an optimal controller. Furthermore, an efficient algorithm to construct a closed-loop control policy, which is better than the optimal open-loop control policy, is given with policy iteration.

Key words： Markov jump system; optimal control; policy iteration

Cite this article

Yankai XU , Xi CHEN . Optimization of Markov jump linear system with controlled modes jump probabilities[J]. Frontiers of Electrical and Electronic Engineering, 2009 , 4(1) : 55 -59 . DOI: 10.1007/s11460-008-0076-5

Introduction

In recent years, switching systems have received great attention because of their potential applications in engineering systems [1]. The Markov jump linear system (MJLS) is a class of switching systems that has been fully studied. MJLS is widely applied in systems with abrupt changes in operating points or disturbances [2,3], including flexible manufacturing systems, power systems, economic systems, fault-tolerant systems and inventory systems[ 4–10].

This paper considers the discrete-time jump linear quadratic Gaussian (JLQG) model. The mode jump of a standard Markov jump system is governed by a Markov chain, whose transition probability matrix is given a priori. In practice, the jump of modes is random, but the jump probabilities can often be controlled (or selected from a finite set). For example, the probability of the jump from normal mode to fault mode of a machine depends on daily maintenance frequency; and the switch between received mode and lost mode of data packages in networked control systems depends on the strength of the communication signal. However, problems with controlled jump probabilities of modes are rarely studied [11–13]. This paper discusses two classes of control policies for modes jump probabilities: open-loop mode control and closed-loop mode control. The relation between the two mode control policies and the standard JLQG model is analyzed. Generally, optimization of the open-loop mode control is an easier problem in a smaller policy space compared to optimization of the closed-loop mode control. By using the performance potential concept and policy iteration approach, this paper presents the sufficient condition under which the optimal closed-loop mode control is better than the optimal open-loop mode control. Based on this condition, we can easily construct a closed-loop mode control policy, which has better performance than the optimal open-loop mode control.

Formulation

The Markov jump system is a class of hybrid systems with two kinds of dynamics, i.e., the mode described by a discrete Markov chain, and the state described by the state-space equation under a mode. Consider the following discrete-time linear system

(1)

x k + 1 = A (θ k) x k + B (θ k) u k + σ (θ k) w k,

where k is the discrete time epoch, BoldItalic_k ∈ Rⁿ is the state,

θ k ∈ = {1, 2, …, S}

is the mode, BoldItalic_k ∈ R^m is the control variable, and w_k is an i.i.d. random variable with mean zero and covariance matrix BoldItalic (identity matrix). BoldItalic(θ_k), BoldItalic(θ_k), BoldItalic(θ_k) are suitably-dimensional matrices depending on the mode. The mode θ_k satisfies the Markov chain whose jump probability matrix is

p = {p i j} i, j ∈ S

. Assume that the Markov chain is ergodic, and it has a steady-state probability row vector BoldItalic = [π₁,π₂,...,π_S]. The criterion to be optimized is

(2)

J (x 0, θ 0) = lim ⁡ K → ∞ 1 K E {∑ k = 0 K - 1 [f u (θ k, x k)] | x 0, θ 0},

where the cost function is

f u (θ k, x k) = x k M (θ k) x k + u k N (θ k) u k .

Suppose that Eq. (1) is stochastically stabilizable. For a stable system, performance criterion Eq. (2) exists and is independent of the initial values [5]. For simplicity, when θ_k = i, the matrices BoldItalic(θ_k), BoldItalic(θ_k), BoldItalic(θ_k), BoldItalic(θ_k) and BoldItalic(θ_k) are written as BoldItalic_i, BoldItalic_i, BoldItalic_i, BoldItalic_i and BoldItalic_i, respectively. Let

a i = σ i σ i T

. Assume that BoldItalic_i and BoldItalic_i are positive semi-definite matrices.

If the jump probabilities

{p i j, ∀ j ∈ S}

from mode i to other modes are not given a priori, but can be chosen from a finite set Y_i, it can be concluded that the jump probabilities

p = {p i j} i, j ∈ S

of the modes are controllable. This paper studies the optimization problem of JLQG with controlled mode jump probabilities. At this point, the admissible control policy

ℒ

is a combination of the feedback control law u and the mode control policy, denoted as

ℒ = {u, p}

. Consider two classes of mode control polices: For the first class, the jump probability is independent of the current system state BoldItalic, i.e., the jump probability takes the same value for any state BoldItalic, thus we call it an open-loop mode control; For the second class, the jump probability depends on the current system state BoldItalic, i.e., the jump probability takes a different value for a different state BoldItalic, thus we call it a closed-loop mode control, and the corresponding jump probability is denoted as p_ij(BoldItalic). Consider the following optimization problems:

Problem 1 Find a state feedback control law u(i,BoldItalic) and a closed-loop mode control p_ij(BoldItalic) to minimize performance Eq. (2). Denote the admissible control policy of Problem 1 as

L 1 = {u (i, x), p i j (x)}

, where the optimal policy is denoted as

ℒ 1 *

Problem 2 Find a state feedback control law u(i,BoldItalic) and an open-loop mode control p_ij to minimize performance Eq. (2). Denote the admissible control policy of Problem 2 as

ℒ 2 = {u (i, x), p i j}

, where the optimal policy is denoted as

ℒ 2 *

Problem 3 Given a mode jump probability matrix BoldItalic, find a state feedback control law u(i,BoldItalic) to minimize performance Eq. (2). Denote the admissible control policy of Problem 3 as

ℒ 3 (p) = {u (i, x)}

, where the optimal policy is denoted as

ℒ 3 * (p)

. Problem 3 is a typical JLQG problem. The following lemma gives its optimal solution.

Lemma 1 [5] Given a mode's jump probability

p = {p i j} i, j ∈ S

, the optimal feedback control law is

u * (i, x) = - L i x,

ℒ i = [N i + B i T F i B i] - 1 B i T F i A i,

where

F i = ∑ j ∈ p i j K j

, BoldItalic_i is the unique solution to the coupled Riccati equation:

K i = A i T F i A i + M i - A i T F i B i L i .

Then the optimal performance is

J � 3 * (p) = ∑ i ∈ � π i tr (a i F i) .

For Problem 3, by applying the results in Ref. [5] and the approach in Ref. [13], it is easy to verify that the value function (also called performance potential [14]) of the optimal feedback control law u^*(i,BoldItalic) is

(3)

g L 3 * (p) (i, x) = x T K i x + q i .

where q_i is the solution to the following equation:

(4)

(I - p) q + J ℒ 3 * (p) e - f ˜ = 0,

where BoldItalic is the identity matrix, BoldItalic = [q₁,q₂,...,q_S]^T, BoldItalic = [1,1,...,1]^T,

f ˜ = [f ˜ 1, f ˜ 2, …, f ˜ S] T

and

f ˜ i = tr (a i F i)

. The solution of Eq. (4) is not unique, i.e., if

q i, ∀ i ∈ S

is a solution, then for any constant c,

q i, ∀ i ∈ S

is also a solution to Eq. (4). Reference [14] gives a special solution in the following form:

(5)

q = (I - p + e π) - 1 f ˜ .

If p_ij is controlled,

{ℒ 3 * (p), p i j}

is an admissible control policy of Problem 2. Let

p * = arg ⁡ min ⁡ p {J {ℒ 3 * (p), p i j}},

then

ℒ 2 * = {ℒ 3 * (p *), p i j *}

is the optimal admissible control policy of Problem 2, and the performance of

ℒ 3 * (p *)

is the same as that of

ℒ 2 *

, i.e.,

J ℒ 2 * = J ℒ 3 * (p *)

. By solving Problem 3, we can find the optimal open-loop mode control in the set

q i, ∀ i ∈ S

, and thus obtain the optimal policy

ℒ 2 *

of Problem 2. Another way to address Problem 2 is to apply the gradient-based method introduced in Ref. [13] to find the optimal policy.

Solving Problem 1 is rather difficult. An admissible control policy of Problem 2 can be viewed as a special admissible control policy of Problem 1, thus the policy space of Problem 2 is a subset of the policy space of Problem 1. Therefore, the optimal performance of Problem 1 is no worse than the optimal performance of Problem 2.

Dynamic systems can be formulated with Markov systems. We define transition function P_i(B|BoldItalic) as the probability of the transition from mode i, state BoldItalic to Borel set B, and P(j,B|i,BoldItalic) as the probability of the transition from mode i, state BoldItalic to mode j, Borel set B. Then we have P(j,B|i,BoldItalic) = p_ijP_i(B|BoldItalic). For any cost function f(i,BoldItalic), define a transition operation

(6)

P f (i, x) = ∑ j ∈ S p i j ∫ y ∈ R n P i (d y | x) f (j, y) .

In the above definition,

f (i, x)

represents the expected cost function at the next time epoch after one step transition when the current mode is i and the state is BoldItalic. Let

ℒ

be a control policy which can be any admissible control policy of Problems 1, 2 and 3. Under this policy, the transition function, cost function and performance are denoted as

P ℒ, f ℒ

and

J ℒ

respectively. Reference [15 ] gives the policy iteration formula to optimize performance.

Lemma 2 [15] Policy

ℒ'

is better than policy

ℒ

, if and only if

For any

i ∈ S, x ∈ R n

, we have

ℒ' g ℒ (i, x) + f ℒ' (i, x) ≤ ℒ g ℒ (i, x) + f ℒ (i, x);

There exists

l ∈ S

and a set X ⊂ Rⁿ with non-zero measure, such that for all BoldItalic ∈ X, we have

ℒ' g ℒ (l, x) + f ℒ' (l, x) < ℒ g ℒ (l, x) + f ℒ (l, x),

g ℒ

is the performance potential under policy

ℒ

, which satisfies the Poisson equation:

J ℒ + g ℒ = f ℒ + ℒ g ℒ .

Analysis of two classes of mode control policies

We shall discuss the condition under which the optimal admissible control policy of Problem 1 is better than that of Problem 2.

Lemma 3 If

∃ l ∈ S,

X ⊂ Rⁿ and

p ¯ l j ∈ Y l

, such that for l and BoldItalic ∈ X, we have

(7)

∑ j ∈ p ¯ l j ∫ y ∈ R n P l ℒ 3 * (p *) (d y | x) g ℒ 2 * (j, y) < ∑ j ∈ p l j * ∫ y ∈ R n P l ℒ 3 * (p *) (d y | x) g ℒ 2 * (j, y),

then

ℒ 1 *

is better than

ℒ 2 *

Proof

ℒ 2 *

can be considered as a special admissible control policy of Problem 1:

ℒ ˜ 1 = {ℒ 3 * (p *), p ˜ i j (x) = p i j *, ∀ x} .

Policies

ℒ 2 *

and

ℒ ˜ 1

have the same performance. We can construct an admissible control policy for Problem 1 as follows:

ℒ ′ 1 = {ℒ 3 * (p *), p ′ i j (x)},

where

(8)

p ′ i j (x) = {p ¯ i j, if i = l and x ∈ X, p i j *, otherwise .

Note that the cost function f^u(i,BoldItalic) only depends on the feedback control law, and is independent of mode control policy, thus

℘ ℒ ′ 1 g ℒ ˜ 1 (i, x) + f ℒ ′ 1 (i, x) = ∑ j ∈ S p ′ i j (x) ∫ y ∈ R n P i ℒ 3 * (p *) (d y | x) g ℒ 2 * (j, y) + f ℒ 3 * (p *) (i, x),

and

℘ ℒ ˜ 1 g ℒ ˜ 1 (i, x) + f ℒ ˜ 1 (i, x) = ∑ j ∈ S p i j * ∫ y ∈ R n P i ℒ 3 * (p *) (d y | x) g ℒ 2 * (j, y) + f ℒ 3 * (p *) (i, x) .

According to Eqs. (7) and (8), for l and BoldItalic ∈ X, we have

℘ ℒ ′ 1 g ℒ ˜ 1 (l, x) + f ℒ ′ 1 (l, x) < ℘ ℒ ˜ 1 g ℒ ˜ 1 (l, x) + f ℒ ˜ 1 (l, x) .

For any other mode i and state BoldItalic, we have

℘ ℒ ′ 1 g ℒ ˜ 1 (i, x) + f ℒ ′ 1 (i, x) = ℘ ℒ ˜ 1 g ℒ ˜ 1 (i, x) + f ℒ ˜ 1 (i, x) .

According to Lemma 2,

ℒ ′ 1

is better than

ℒ ˜ 1

. And

ℒ 1 *

is no worse than

ℒ ′ 1

, thus

ℒ 1 *

is better than

ℒ ˜ 1

. Since

ℒ 2 * = ℒ ˜ 1

ℒ 1 *

is better than

ℒ 2 *

Lemma 4 If

∃ l ∈ S,

X ⊂ Rⁿ and

p ¯ l j ∈ Y l

, such that for l and BoldItalic ∈ X, we have

(9)

∑ j ∈ p ¯ l j [x T (A l - B l L l) T K j (A l - B l L l) x + tr (a l K j) + q j] < ∑ j ∈ p l j * [x T (A l - B l L l) T K j (A l - B l L l) x + tr (a l K j) + q j] .

Thus

ℒ 1 *

is better than

ℒ 2 *

K i, q i, L i, ∀ i ∈ S

are solutions to Problem 3 with a given BoldItalic*.

Proof Put Eq. (3) into Eq. (7), and the proof can be obtained with Lemmas 3 and 4.

Let

Π ≜ (A l - B l L l) T ∑ j ∈ K j (p ¯ l j - p l j *) (A l - B l L l),

and

Γ ≜ ∑ j ∈ (tr (a l K j) + q j) (p l j * - p ¯ l j),

then Eq. (9) is equivalent to

(10)

x T Π x < Γ .

With the above lemmas, we obtain the sufficient condition under which the optimal closed-loop mode control is better than the optimal open-loop mode control.

Theorem 1 If

∃ l ∈ S,

X ⊂ Rⁿ and

p ¯ l j ∈ Y l

, such that one of the following three conditions is satisfied, then

ℒ 1 *

is better than

ℒ 2 *

Π is an indefinite matrix;

Π is a non-zero positive semi-definite matrix, and Γ > 0;

Π is a non-zero negative semi-definite matrix, and Γ < 0.

K i, q i, L i, ∀ i ∈ S

are solutions to Problem 3 when BoldItalic* is given.

Proof First, Eq. (10) does not constantly hold. Otherwise, from Lemma 2 we obtain that

p ¯ l j

is better than

p l j *

. This conclusion conflicts with the definition that BoldItalic* is the jump probability matrix corresponding to the optimal admissible control policy of Problem 2. Thus the solution to Eq. (10) is not Rⁿ space. Therefore, the Lemma 4 condition is equivalent to that for the optimal admissible control policy of Problem 2,

ℒ 2 *, ∃ l ∈ S

and

p ¯ l j ∈ Y l

, such that Eq. (10) has a solution.

The matrix Π may be found in four cases: non-zero positive semi-definite, non-zero negative semi-definite, zero and indefinite. We will discuss these four cases respectively.

When condition 1) holds, i.e., Π is indefinite, then Eq. (10) always has a solution no matter what Γ is.

When condition 2) holds, i.e., Π is non-zero positive semi-definite, then we have 0 ≤ BoldItalic^TΠBoldItalic < +∞. Thus, Eq. (10) has a solution and does not hold constantly, which means Γ > 0.

When condition 3) holds, i.e., Π is non-zero negative semi-definite, then we have -∞ < BoldItalic^TΠBoldItalic ≤ 0. If Γ ≥ 0, then from Lemma 2 we obtain that

p ¯ l j

is better than

p l j *

, which conflicts with the definition of BoldItalic*. Therefore, Γ < 0.

If Π is zero, when Γ > 0, the solution of Eq. (10) is Rⁿ. And when Γ ≤ 0, Eq. (10) has no solution. Therefore, Π cannot be zero. Theorem 1 is proved.

Theorem 1 gives the sufficient condition under which the optimal closed-loop mode control is better than the optimal open-loop mode control. If this condition is satisfied, the closed-loop mode control achieves better performance than the open-loop mode control. This result is helpful for designing controllers. For the optimization problem of JLQG with controlled jump probabilities, seeking the optimal open-loop mode control is relatively simpler with a smaller policy space. However, seeking the optimal closed-loop mode control is rather difficult, since if the jump of modes depends on system state, the state feedback control law then cannot be solved by the standard JLQG model any more. The only existing methods are dynamic programming or simulation-based policy iteration approaches [15] which are time consuming. If the optimal open-loop mode control policy is obtained, we can apply the sufficient condition provided by Theorem 1 to check if there is any better closed-loop mode control policy. Thereafter, we can improve the jump probability of modes.

For a special case with two modes and a one-dimensional state, Theorem 1 can be simplified.

Corollary 1m = n = 1 and the mode set is

S = {1, 2}

. If

∃ l ∈ S

p ¯ l j ∈ Y l

p ¯ l j ≠ p l j *

, such that

(11)

(σ l 2 K 2 + q 2) - (σ l 2 K 1 + q 1) (A l - B l L l) 2 (K 1 - K 2) > 0,

then

ℒ 1 *

is better than

ℒ 2 *

K i, q i, L i, ∀ i ∈ S

are the solutions to Problem 3 when BoldItalic* is given.

Proof In this case,

Π = (A l - B l L l) 2 (p ¯ l 1 - p l 1 *) (K 1 - K 2),

Γ = (p l 1 * - p ¯ l 1) [(σ l 2 K 1 + q 1) - (σ l 2 K 2 + q 2)] .

According to Theorem 1, it is easy to obtain the proof for Corollary 1.

Furthermore, we can solve the interval X. Solve the equation Πx² = Γ, we have

x + = (σ l 2 K 2 + q 2) - (σ l 2 K 1 + q 1) (A l - B l L l) 2 (K 1 - K 2),

(12)

x - = - (σ l 2 K 2 + q 2) - (σ l 2 K 1 + q 1) (A l - B l L l) 2 (K 1 - K 2) .

If Eq. (11) holds, x₊ and x_- exist. If

(p ¯ l 1 - p l 1 *) (K 1 - K 2) > 0

, we have X = [x_-,x₊]. If

(p ¯ l 1 - p l 1 *) (K 1 - K 2) < 0

, the interval becomes X = (-∞,x_-] ∪ [x₊,+∞).

If the condition in Theorem 1 is satisfied, we should seek an admissible control policy of Problem 1 to improve system performance. Seeking

ℒ 1 *

is rather difficult. As a compromise, we tried to find a good admissible control policy for Problem 1, which improves system performance significantly compared with

ℒ 2 *

. All the proofs of the above lemmas and theorems are all constructive, thus we can construct a closed-loop mode control following the same idea. Combining the original feedback control law and the closed-loop mode control, we obtain a new policy, which is better than

ℒ 2 *

and BoldItalic*.

Algorithm 1 Construct an admissible control policy for Problem 1.

Solve the optimal admissible control policy of Problem 2,

ℒ 2 *

Check the condition contained in Theorem 1. When it is satisfied, there may exist more than one mode

l ∈ S

, and even in one Y_l there may exist several possible

p ¯ l j' s

. Then choose any mode

l ∈ S

, which satisfies the condition and corresponding X ⊂ Rⁿ and

p ¯ l j ∈ Y l

Implement a one-step policy iteration for

ℒ 2 *

. Construct

ℒ ′ 1 = {ℒ 3 * (p *), p ′ i j (x)}

by using Eq. (8).

Numerical example

Example 1 Consider a system with a one-dimensional state and two modes,

S = {1, 2}

A 1 = 0.5; B 1 = 2; M 1 = 1; N 1 = 10; σ 1 = 2; A 2 = 1; B 2 = 1; M 2 = 2; N 2 = 10; σ 2 = 1; Y 1 = {[0.8 0.2], [0.2 0.8]}, Y 2 = {[0.5 0.5]} .

First, find the optimal open-loop mode control. Therefore, we have

p * = [0.8 0.2 0.5 0.5] .

Then K₁ = 1.236, K₂ = 2.614, q₁ = 6.549, q₂ = 0.664, L₁ = 0.094, L₂ = 0.161. When l = 1, we have

[(σ 12 K 2 + q 2) - (σ 12 K 1 + q 1)] / [(A 1 - B 1 L 1) 2 (K 1 - K 2)] = 2.772 > 0,

thus the optimal admissible control policy of Problem 1,

ℒ 1 *

, is better than

ℒ 2 *

. According to Eq. (12), we have

X = [- 1.665, 1.665] .

Using Algorithm 1, we can construct a better closed-loop mode control: When x ∈ X,

p ′ (x) = [0.2 0.8 0.5 0.5];

otherwise, BoldItalic′(x) = BoldItalic*. Through simulation we have that the system performance under policy

ℒ 2 *

is 4.87, while the performance under constructed policy

ℒ ′ 1

is 4.06, which achieves a 17% improvement than

ℒ 2 *

Conclusions

The paper considers two classes of mode control policies, and gives the sufficient condition under which the optimal closed-loop mode control is better than the optimal open-loop mode control. When optimizing the JLQG model with controlled jump probabilities of modes, we can first find the optimal feedback control law and then the optimal open-loop mode control, after which we can check the condition that is sufficient. If it is satisfied, we can further seek better closed-loop mode control policies; otherwise, the open-loop mode control is good enough, and we can stop searching to avoid huge computations. If the optimal closed-loop mode control is difficult to obtain, we can apply Algorithm 1 to construct a closed-loop mode control, which significantly improves system performance. The numerical example verifies the efficiency of the proposed algorithm.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 60574064, 60736027).

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	ChengD Z, GuoY Q. Advances on switched systems. Control Theory and Applications, 2005, 22(6): 954–960(in Chinese)

2	Abou-kandilH, De SmetO, FreilingG, . Flow control in a failure-prone multi-machine manufacturing system. In: Proceedings of INRIA/IEEE Symposium on Emerging Technologies and Factory Automation. 1995, 2: 575–583 DOI

3	BoukasE K, ShiP, AndijaniA. Robust inventory-production control problem with stochastic demand. Optimal Control Applications and Methods, 1999, 20(1): 1–20 DOI

4	JiY, ChizeckH J. Controllability, stabilizability, and continuous-time Markovian jump linear quadratic control. IEEE Transactions on Automatic Control, 1990, 35(7): 777–788 DOI

5	CostaO, FragosoM D, MarquesR P. Discrete-Time Markov Jump Linear Systems. London: Springer-Verlag, 2005

6	XueF, GuoL. Necessary and sufficient conditions for adaptive stabilizability of jump linear systems. Communications in Information and Systems, 2001, 1(2): 205–224

7	ZhangL J, LiC W, ChengD Z. Robust adaptive control of Markov jump systems with parameter uncertainties. Control and Decision, 2005, 20(9): 1030–1033(in Chinese)

8	LiuF. Robust L₂-L_∞ filtering for uncertain jump systems. Control and Decision, 2005, 20(1): 32–35(in Chinese)

9	LiuF, ZhangX H. Robust control for jump systems with L₂ gain constraints. Control Theory and Applications, 2006, 23(3): 373–377(in Chinese)

10	LiuF, SuH Y, ChuJ. Robust positive real control of Markov jump systems with parametric uncertainties. Acta Automatica Sinica, 2003, 29(5): 761–766(in Chinese)

11	JiY, ChizeckH J. Optimal quadratic control of jump linear systems with separately controlled transition probabilities. International Journal of Control, 1989, 49(2): 481–491

12	BoukasE K, LiuZ K. Jump linear quadratic regulator with controlled jump rates. IEEE Transactions on Automatic Control, 2001, 46(2): 301–305 DOI

13	XuY K, ChenX. Discrete-time JLQG with dependently controlled jump probabilities. In: Proceedings of IEEE 22nd International Symposium on Intelligent Control, Singapore. 2007: 441–445

14	CaoX R. Stochastic Learning and Optimization: A Sensitivity-Based Approach. New York: Springer, 2007

15	ZhangK J, XuY K, ChenX, . Policy iteration based feedback control. Automatica, 2008, 44(4): 1055–1061 DOI

Options

Abstract

Outlines