Low-rank spectral estimation algorithm of learning Markov model

Yongye ZHAO , Shujun BI

Front. Math. China ›› 2024, Vol. 19 ›› Issue (3) : 137 -155.

PDF (5820KB)
Front. Math. China ›› 2024, Vol. 19 ›› Issue (3) : 137 -155. DOI: 10.3868/s140-DDD-024-0009-x
RESEARCH ARTICLE

Low-rank spectral estimation algorithm of learning Markov model

Author information +
History +
PDF (5820KB)

Abstract

This paper proposes a low-rank spectral estimation algorithm of learning Markov model. First, an approximate projection algorithm for the rank-constrained frequency matrix set is proposed, and thereafter its local Lipschitzian error bound established. Then, we propose a low-rank spectral estimation algorithm for estimating the state transition frequency matrix and the probability matrix of Markov model by applying the approximate projection algorithm to correct the maximum likelihood estimation of the frequency matrix, and prove that there is only a multiplying constant difference in estimation errors between the low-rank spectral estimation and the maximum likelihood estimation under appropriate conditions. Finally, numerical comparisons with the prevailing maximum likelihood estimation, spectral estimation, and rank-constrained maximum likelihood estimation show that the low-rank spectral estimation algorithm is effective.

Graphical abstract

Keywords

Markov model / low-rank spectral estimation / error bound / approximate projection

Cite this article

Download citation ▾
Yongye ZHAO, Shujun BI. Low-rank spectral estimation algorithm of learning Markov model. Front. Math. China, 2024, 19(3): 137-155 DOI:10.3868/s140-DDD-024-0009-x

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

The learning of Markov model often arose from fields such as statistics, systems engineering, data science, and management [1-3, 14, 16]. For example, the potential urban transportation pattern could be revealed to further facilitate taxi resources allocation, by modeling the urban taxi movement trajectory into a Markov process, and conducting state clustering by estimating its state transition frequency matrix and probability matrix [3, 9]. Similarly, learning corresponding Markov model could also assist in solving the problem of designing the operation route of urban buses [8], personalized pagerank [2], and data sorting in e-commerce [10]. It should be pointed out Markov process originated from the above problems normally possessed a huge state space, but their state transition frequency matrix and probability matrix had been proven to be low-rank or approximately low-rank [3, 9, 16]. Although Markov process had a huge state space, but when the state transition frequency matrix satisfied low rankness, it was possible to accurately learn the process based on a short empirical trajectory [12, 16, 18]. For examples and applications of other low-rank Markov processes, readers, if interested, can refer to references [3, 5, 12, 15, 16] and others.

In recent years, many scholars have conducted research on the estimation of the state transition frequency matrix and probability matrix of low-rank Markov model [5, 16, 18]. As far as we know, existing estimation algorithms with polynomial time complex bound cannot guarantee that the estimation of frequency matrix and probability matrix satisfies the low-rank condition. For example, Zhang and Wang[16] proposed a spectral estimation method for learning low-rank Markov model by combining low-rank truncated singular value decomposition in maximum likelihood estimation of frequency matrix with non-negative projection, established the statistical error bound for the estimation, and proved that the difference between the statistical error bound and the lower bound of minimax error was one logarithmic factor of the Markov chain trajectory length. However, as mentioned in [16], due to the non-negative projection required by the spectral estimation algorithm, the final frequency and probability estimation matrices might not necessarily be low-rank. We have also verified this through numerical experiments, and found out that the spectral estimation matrices are hardly low-rank, but instead full-rank in some cases. This is also the main reason why the error of spectral estimation fails to be near optimal. To overcome this defect, [18] proposed the nuclear norm regularized maximum likelihood estimation and rank-constrained maximum likelihood estimation methods for state transition matrices, established the statistical error bounds for them, and proved that the statistical error bound and the lower bound of the minimax error only differed by one logarithmic factor of the Markov chain state space dimension, which was theoretically near optimal. It is noted that the optimal solution for the nuclear norm regularized problems may not necessarily satisfy the low-rank condition; the optimal solution of rank-constrained optimization problems can otherwise, but it is generally NP-hard to seek the optimal solution in computation. Although [18] designed a class of DC (difference of convex functions) programming algorithm to approximate the solution of rank-constrained maximum likelihood estimation problems. Anyway, nuclear norm regularized maximum likelihood estimation and rank-constrained maximum likelihood estimation require a large amount of computation, because singular value decomposition is needed in each step of the algorithm. Therefore, the estimation method raised in [18] is not suitable for large-scale Markov model learning. In addition, [5] designed a low-rank recovery algorithm for state transition matrices, but the estimation matrix of this method might not necessarily be non-negative, so it couldn’t be always a low-rank state transition matrix. Inspired by the above researches, this paper attempts to seek a method that can quickly obtain low-rank frequency matrix and probability matrix to learn a low-rank Markov model.

Another area related to this paper lies in the study of set error bounds. Error bound plays a core role in the theory and algorithm research of optimization, and the research on it always remained a focus and challenge in the field of optimization [4, 7, 11, 13, 17]. Reference [11] proved that the convex polyhedron set had a global Lipschitz error bound; for general convex feasible systems, Lipschitz error bounds only existed under certain constraint qualifications. Specifically, [11] pointed out that for general non-convex feasible systems, the global error bound, even of Hölder type, could hardly be established. Existing results mainly established local Hölder error bounds for subanalytic systems and polynomial systems [7,11]. As a special class of non-convex feasible system, the research on the error bounds of rank-constrained sets has been limited. For a feasibility system constituting the intersection of a matrix closed convex set and rank-constrained set, when its multivalued function satisfied the condition of calmness at the origin, [4] established the local Lipschitz error bound for estimation of distance from any matrix in the closed convex matrix to the rank-constrained feasibility system. However, the difficulty in verifying the calmness of multivalued function is almost the same with establishing error bound. Therefore, no solution has been found for the error bound of general matrix rank-constrained feasibility system.

The main work of this paper is to propose a low-rank spectral estimation algorithm for learning Markov model based on the approximate projection algorithm of the rank-constrained frequency matrix set. Given a positive integer r[1,d1], this paper defines the rank-constrained frequency matrix set as follows:

F:={ZRd×d:rank(Z)r,eTZe=1,Z0},

where rank(Z) stands for the rank of Z, eRd represents the d-dimensional column vector with all elements as 1, matrix Z0 indicates that its elements Zij0, 1i,jd. Obviously, 1d2eeTF, so set F is non-empty and bounded.

First, this paper will establish the local Lipschitz error bound of set F, that is to give estimation of distance from any frequency matrix F{ZRd×d:eTZe=1,Zeα,Z0} to set F, where α(0,1d) is constant, Zeα indicates (Ze)iα,i=1,2,,d. To be specific, a low-rank spectral estimation algorithm for the state transition frequency matrix and probability matrix of the Markov model is proposed. The state transition frequency matrix and probability matrix obtained by our algorithm can satisfy the low-rank condition, thus overcoming the limitation that the spectral estimation in [16] is not a low-rank state transition matrix. Besides, our method does not require solving the optimization problem, so it has a low computational complexity compared to the rank-constrained maximum likelihood estimation [18]. And we further prove that the estimation error between low-rank spectral estimation and maximum likelihood estimation only differs in one multiplying constant under the condition that the probability of each state is equal to or greater than α. Finally, the proposed method is numerically compared with the maximum likelihood estimation [1], spectral estimation [16], and rank-constrained maximum likelihood estimation [18], showing that our method is effective; in addition, we combine the low-rank spectral estimation algorithm and k-means clustering algorithm into the analysis of taxi trajectory in Manhattan Island, New York City, USA.

At the end of this section, we present the main notations of this paper.

(1) Denote ERd×d as a matrix with all elements as 1; eRd as a column vector with all elements as 1.

(2) For any matrix ZRd×d, define Z:=max1i,jd|Zij| as an infinite norm, Z1:=i,j=1d|Zij| is l1-norm, ZF:=i,j=1dZij2 is F(robenius) norm, lth row vector of Z is written as Z(l,:)Rd, given aR, Za represents Zija, 1i,jd. Define the distance from ZRd×d to l1-norm of F as

dist(Z,F):=min{ZX1:XF}.

(3) Given xRd, define l1-norm as x1:=i=1d|xi|, given aR, xa represents xia,1id.

(4) For any matrix ZRd×d, define Πr(Z):=i=1rσi(Z)ui(Z)vi(Z)T, where σi(Z) is the ith maximum singular value of Z, ui(Z)Rd and vi(Z)Rd corresponds to left and right singular vector of σi(Z), where i=1,2,,r. As is known to all, Πr(Z)argmin{ZXF:rank(X)r}.

(5) Define Ωα:={ZRd×d:eTZe=1,Zeα,Z0}, where α(0,1d) is constant.

2 Algorithm and its properties

In this section, we will propose a low-rank spectral estimation algorithm for the state transition frequency matrix and probability matrix of a Markov model. We consider a discrete-time Markov process with d(2) states {S1,S2,,Sd}, and assume that its state transition probability matrix is P¯Rd×d and frequency matrix is F¯Rd×d, with satisfied low-rank condition rank(P¯)=rank(F¯)=rd. It is aimed to estimate the matrices P¯ and F¯ through the Markov chain {X0,X1,,Xn} with a trajectory length of n + 1. Firstly, we present the assumptions in this section.

Assumption 2.1  {X0,X1,,Xn} is an ergodic Markov chain and there exists a constant α(0,1d), such that F¯eα.

Assumption 2.1 is also required for the theoretical analysis in [16]. Let’s mark the stationary distribution of Markov chain as πRd. It follows from [16] that F¯=Diag(π)P¯, where Diag(π)Rd×d indicates the diagonal matrix of the ith diagonal element πi(i=1,2,,d). Since P¯e=e, π=F¯e, so Assumption 1.1 is equivalent to πα. When min1idπi is relatively small, the probability of some states appearing in X0,X1,,Xn is extremely low, posing a challenge to the estimation of state transition frequency matrix and probability matrix.

2.1 Approximate projection algorithm of rank-constrained frequency matrix set

As is widely known, for any matrix FRd×d, it was NP-hard to compute its projection in F [6]. In the following, we will propose a fast approximate projection algorithm for any FΩα in F, and establish its local Lipschitz error bound.

Algorithm 2.1 (approximate projection algorithm of FΩα in F)

Input: FΩa, r[1,d1], κ(0,0.5), μη>0, iteration step T.

(S1) Compute Πr(F). If min1id(Πr(F)e)<κα, output F^=1d2, end; or turn to (S2).

(S2) If min1i,jd(Πr(F))ij0, output F^=Πr(F)Πr(F)1, end; or turn to (S3).

(S3) Set F0=Πr(F), k = 1, wRd, where wi=min(μ,max((FTe)i,η)), tj=0,1jT.

Compute

while min1i,jd(F0)ij<0&kT

  if kT1

   [I,J]=find(F0=min1i,jd(F0)ij);tk=min1id,jJF0(i,j)(Πr(F)e)i;

   F0=F0tkΠr(F)ewTminjJwj;

  else

   tk=min1i,jdF0(i,j)(Πr(F)e)i;F0=F0tkΠr(F)eeT;

  end if

   k = k + 1;

end while

Output F^=F0F01, end.

Lemma 2.1  For any FΩα, output matrix F^ in Algorithm 2.1 satisfies F^F and F^e>0.

Proof If min1id(Πr(F)e)<κα, then F^=1d2EF and F^e=1d, so the conclusion holds. If min1id(Πr(F)e)καmin1i,jd(Πr(F))ij0, then F^=Πr(F)Πr(F)1F and F^e=Πr(F)eΠr(F)1καΠr(F)1>0, so the conclusion also holds.

Next, let’s assume min1id(Πr(F)e)κα and min1i,jd(Πr(F))ij<0. It follows from (S3) in Algorithm 2.1 that F00 is the output in the end and can be expressed as F0=Πr(F)Πr(F)eqT, where qRd and q0. Since Πr(F)e>0, rank(F0)r, and F01Πr(F)1κα. Hence, F^=F0F01F. It is noted that all tk0 in (S3), so we can get F^e=F0eF01καF01>0.

The following proposition establishes the local Lipschitz error bound of rank-constrained frequency matrix set, namely estimation of distance from any matrix FΩα to l1-norm of set F.

Proposition 2.1  For any FΩα, let’s assume the output of Algorithm 2.1 is F^. The error bound inequation below is obtained:

dist(F,F)FF^12FΠr(F)1α(1κ)+4dμTηFΠr(F)ακ.

Proof Given any FΩα. It follows from Lemma 2.1 that F^F. dist(F,F) is l1-norm distance, so dist(F,F)FF^1F1+F^1=2. We will try to demonstrate from three cases as below.

Case 1  min1id(Πr(F)e)<κα. Since FΩα, Feα. So

FΠr(F)1FeΠr(F)eαmin1id(Πr(F)e)ακα.

Thus dist(F,F)22FΠr(F)1α(1κ), and inequation (2.1) holds.

Case 2  min1id(Πr(F)e)κα and 且min1i,jd(Πr(F))ij0. Then F^=Πr(F)Πr(F)1. Based on F1=eTFe=1, we can get

FF^1=FΠr(F)+Πr(F)Πr(F)Πr(F)11FΠr(F)1+Πr(F)Πr(F)Πr(F)11=FΠr(F)1+|Πr(F)1F1|2FΠr(F)1.

Thus, dist(F,F)FF^12FΠr(F)12FΠr(F)1α(1κ), and inequation (1.1) holds.

Case 3  min1id(Πr(F)e)κα and min1i,jd(Πr(F))ij<0. If Πr(F)1>2, then

dist(F,F)22(Πr(F)1F1)2FΠr(F)12FΠr(F)1α(1κ).

Therefore, inequation (2.1) is established. Then we assume Πr(F)12. It follows through Algorithm 2.1 (S3) that tk0, k=1,2,,T, and combining Πr(F)ewTminjJwj0, it can be seen that element of F0 increases with k. Hence, {tk} is a non-descending sequence while {|tk|} a non-increasing sequence, and

|tk||t1||min1i,jdΠr(F)(i,j)(Πr(F)e)i||min1i,jdΠr(F)(i,j)|κα,k=1,2,,T.

Note that F0, min1i,jdΠr(F)(i,j)<0, so F0, min1i,jdΠr(F)(i,j)<0. Thus

F0Πr(F)1dμηΠr(F)1i=1T|ti|2TdμηFΠr(F)κα,

where F0 is the final output of Algorithm 2.1 (S3). By using F^=F0F01, F1=eTFe=1, and inequation (2.2), it follows

dist(F,F)FF0+F0F^1FF01+F0F0F011=FF01+|F011|=FF01+|F01F1|2FF012FΠr(F)1+2F0Πr(F)12FΠr(F)1α(1κ)+4dμTηFΠr(F)ακ.

Therefore, inequation (2.1) is established.

Note 2.1 (i) Although the right side of the error bound inequation in Proposition 2.1 contains dimensional factor d, namely dFΠr(F), dFΠr(F)=O(FΠr(F)1) when FΠr(F) is a dense matrix with relatively uniform elements. In the future, we will conduct researches on dFΠr(F)=O(FΠr(F)1) derived from frequency matrix estimation of low-rank Markov model.

(ii) It is noted that when μ = n, T = 1, the right side of the error bound inequation in Proposition 2.1 is minimal. This is because we scaled wminjJwj up to μηe in inequation (2). In fact, when μFTeη, w=FTe, so w is used to depict the weights of each column in matrix F, and it can uplift the quality of approximate projection F^ if applied in correcting F0.

2.2 Low-rank spectral estimation algorithm

Define nij:=|{k:Xk1=Si,Xk=Sj,k=1,2,,n}|, 1i, jd, namely the sum of transition times from state Si to Sjis nij. Define matrices F~Rd×d and P~Rd×d as follows:

F~(i,j)=nijn,P~(i,j)={nijj=1dnij,ifj=1dnij>0;1d,ifj=1dnij=0,1i,jd.

[1] has been proved that F~ and P~ are the maximum likelihood estimation for F¯ and P¯, respectively, and featured by strong consistency. Yet, the maximum likelihood estimation hasn’t utilized the low-rank information in rank(P¯)=rank(F¯)=rd. Considering the low-rankness of matrices F¯ and P¯ to be estimated, we present the low-rank spectral estimation algorithm of Markov process.

Algorithm 2.2 (Low-rank spectral estimation algorithm)

Input: F~, α(0,1d), r[1,d1], κ(0,0.5), μη>0, integer T.

(S1) Compute the corrected maximum likelihood estimation F1Rd×d:

m+=i=1dmax((F~e)iα,0);m=i=1dmax(α(F~e)i,0);

     For i = 1 to d

     if (F~e)i=0

      F1(i,:)=αde;;

     elseif (Fe)i<α

      F1(i,:)=F~(i,:)+(α(F~e)i)F~(i,:)(F~e)i;

     else

      F1(i,:)=F~(i,:)mm+((F~e)iα)F~(i,:)(F~e)i;

     end if

     end for

(S2) Use Algorithm 2.1 to compute the estimation matrix F^1 of F¯ with F1, r, κ, μ, η, T as the input.

(S3) Computer the estimation matrix P^1Rd×d of P¯, where P^1(i,:)=F^1(i,:)F^1(i,:)1,i=1,2,,d.

The following Theorem 2.1(a) shows that Algorithm 2.2 (S1) results in F1Ωα. It follows through Lemma 2.1 that F^1F is a low-rank frequency matrix and F^1e>0, thereby the P^1 generated by Algorithm 2.2 (S3) is meaningful as a low-rank state transition frequency matrix, namely P^1{PRd×d:rank(P)r,Pe=e,P0}. Therefore, the estimation matrices F^1 and F^1 in Algorithm 2.2 meet the low-rank condition, overcoming the defect of spectral estimation [16]. In particular, Algorithm 2.2 doesn’t need to solve matrix optimization, so there is little computational load. The following is the main conclusions of this section.

Theorem 2.1  Let’s assume that F1, F^1, P^1 are generated by Algorithm 2.2. If Assumption 2.1 holds, then the following conclusions are tenable.

(a) F1Ωα and F1F¯13F~F¯1;

(b) F^1F¯13(1+2dα(1κ)+4dμTηκα)F~F¯1;

(c) P^1P¯16α(1+2dα(1κ)+4dμTηκα)F~F¯1.

Proof (a) Fix any index i{1,2,,d}. If 0(F~e)i<α, obviously (F1e)i=α and F1(i,j)0, j=1,2,,d. If (F~e)iα, then (F1e)i=(F~e)imm+((F~e)iα). By using 1>dα and

1=F~1=i=1d(F~e)i=(F~e)iα(((F~e)iα)+α)+(F~e)i<α(α(α(F~e)i))=dα+m+m,

m+ > m. Hence, if (F~e)iα, (F1e)iα, and F1(i,j)0, j=1,2,,d. Through direct verification it can be obtained that F11=eTF1e=1. So F1Ωα. Based on Assumption 2.1 and definition of m+, m, it follows

(F~e)iαj=1d|F1(i,j)F¯(i,j)|=(F~e)iαj=1d|F~(i,j)F¯(i,j)mm+((F~e)iα)F~(i,j)(F~e)i|(F~e)iα(F~(i,:)F¯(i,:)1+mm+((F~e)iα))=(F~e)iαF~(i,:)F¯(i,:)1+m.

Similarly, we can prove (F~e)i<αj=1d|F1(i,j)F¯(i,j)|(F~e)i<αF~(i,:)F¯(i,:)1+m. With Assumption 2.1, m=i=1dmax(α(F~e)i,0)i=1dmax((F¯e)i(F~e)i,0)F~F¯1. Based on the above,

F1F¯1=(F~e)iαj=1d|F1(i,j)F¯(i,j)|+(F~e)i<αj=1d|F1(i,j)F¯(i,j)|3F~F¯1.

(b) Based on the definition of Πr(F1) and rank(F¯)r, it follows that F1Πr(F1)FF1F¯F. Therefore, we can get after calculation based on Proposition 2.1 and result of (a) as well as relationship between l1-norm and F(robenius) norm.

F^1F¯1=F^1F1+F1F¯1F^1F11+F1F¯12F1Πr(F1)1α(1κ)+4dμTηF1Πr(F1)ακ+F1F¯12dF1Πr(F1)Fα(1κ)+4dμTηF1Πr(F1)Fακ+F1F¯12dF1F¯Fα(1κ)+4dμTηF1F¯Fακ+F1F¯12dF1F¯1α(1κ)+4dμTηF1F¯1ακ+F1F¯13(1+2dα(1κ)+4dμTηκα)F~F¯1.

(c) It can be seen that P¯(i,:)=F¯(i,:)F¯(i,:)1, i=1,2,,d. With [16, Lemma 2] and F¯(i,:)1=(Fe)iα, i=1,2,,d, we can get

P^1P¯1=i=1dP^1(i,:)P¯(i,:)1=i=1dF^1(i,:)F^1(i,:)1F¯(i,:)F¯(i,:)11i=1d2F^1(i,:)F¯(i,:)1F¯(i,:)12αF^1F¯1.

With the result of (b), P^1P¯16α(1+2dα(1κ)+4dμTηκα)F~F¯1. □

Note 2.2 (i) Theorem 2.1 shows that the estimation error between Algorithm 2.2 and the maximum likelihood estimation only differs by one multiplying constant. It is noted that the right side of equation in Theorem 2.1 (b) and (c) contains dimensional factor d of state space. That is because we directly scaled F1F¯F up to F1F¯1 in inequation (2.3). In fact, F1F¯ is a dense matrix with relatively uniform elements, so dF1F¯F=O(F1F¯1), further improving the conclusion in Theorem 2.1 (b) and (c).

(ii) With [16, Lemma7], it can be seen that there exist constants C>0 and C>1, such that the probability for F~F¯CF¯eτlog2(n)n to establish is at least 1nc, where stands for the spectral norm of matrix, for the infinite norm of vector, and τ for 14- mixing time of Markov chain. In this way, we can work out the statistical error bound of low-rank spectral estimation matrix generated by Algorithm 2.2 by using inequation X1dXFddX, XRd×d in combination with Theorem 2.1.

3 Numerical experiments

In this section, we first compare the numerical performance of low-rank spectral estimation Algorithm 2.2 with spectral estimation [16], maximum likelihood estimation [1] and rank-constrained maximum likelihood estimation [18] through artificially synthesized data experiments. Then, we use Algorithm 2.2 combined with k-means clustering algorithm to analyze the public dataset of taxi trajectory in Manhattan Island, New York City, USA and reveal its potential transportation patterns. All numerical tests in this section have been conducted on laptops configured with Inter (R) Core (TM) i7-8550U CPU @ 1.80GHz and 8.00GB RAM by running Matlab R2019a under the Windows 10 system.

3.1 Synthetic data experiments

Firstly, the true frequency matrix and probability matrix with a rank of r are generated through Examples 3.1 and 3.2 below. Then, based on F¯ and P¯, a Markov chain {X0,X1,,Xn}, where k{1,1.5,,5.5}, with a trajectory length of n = round(krd log(d)) is simulated through a computer.

In Algorithm 2.2, we assign α=1d2, κ = 0.001, μ=F1Te, η=min1id(F1Te)i, T = 50. Let’s assume that F and P are estimation of F¯ and P¯, respectively, and we measure the estimation performance with the following three values:

ηF=FF¯1,ηP=PP¯1d,ηU,V=max(rU¯TUF2,rV¯TVF2),

where, URd×r, VRd×r are the matrices composed of the left and right singular vectors corresponding to the foregoing r maximum singular values of P, U¯Rd×r, V¯Rd×r are the matrices composed of the left and right singular vectors corresponding to the foregoing r maximum singular values of P¯, and ηU,V can be regarded as the estimation error in the characteristic subspace. Since the rank-constrained maximum likelihood estimation [18] only provided an estimation of the state transition probability matrix P¯, we only compare ηp and ηU,V.

Example 3.1 Generate matrix U0Rd×r, V0Rd×r with independent elements that obey standard normal distribution. Consider

Type I: Z=|U0||V0T|, where elements of |U0|Rd×r, |V0|Rd×r are absolute values of U0, V0 elements;

Type II: Z=U~0V~0T, where U~0Rd×r, V~0Rd×r is defined as (U~0)ij:=(U0)ij2U0(i,:)2, (V~0)ij:=(V0)ij2V0(:,j)2.

Define F¯Rd×d and P¯Rd×d:F¯:=ZZ1, P¯(i,:):=F¯(i,:)F¯(i,:)1, i=1,2,,d.

Fig.1 and Fig.2 present the computation results of four methods for Example 3.1. In terms of the estimation error of the frequency matrix, Algorithm 2.2 is higher than spectral estimation and maximum likelihood estimation. In terms of the estimation error of the probability matrix, when F¯ and P¯ are given by Type I, Algorithm 2.2 is optimal, followed by rank-constrained maximum likelihood estimation. When F¯ and P¯ are given by Type II, Algorithm 2.2 has similar estimation errors to rank-constrained maximum likelihood estimation and spectral estimation, both of which are lower than maximum likelihood estimation. For the estimation error of characteristic subspace, Algorithm 2.2, spectral estimation, and rank-constrained maximum likelihood estimation enjoy advantages over maximum likelihood estimation. When it comes to computational time, Algorithm 2.2, spectral estimation and maximum likelihood estimation are all less than rank-constrained maximum likelihood estimation because the former three methods do not require solving optimization problem while the latter requires using DC programming to solve a matrix rank-constrained optimization problem.

Example 3.2 Generate matrices U0Rd×r, V0Rd×r with independent elements that obey standard normal distribution and diagonal matrix DRd×d, where diagonal elements obey distribution beta(γ, γ). Consider

Type I: Z=|U0||V0T|D, where elements of |U0|Rd×r, |V0|Rd×r are absolute values of U0, V0 elements;

Type II: Z=U~0V~0TD, where U~0Rd×r, V~0Rd×r is defined as (U~0)ij:=(U0)ij2U0(i,:)2, (V~0)ij:=(V0)ij2V0(:,j)2 .

Define F¯Rd×d and P¯Rd×d:F¯:=ZZ1, P¯(i,:):=F¯(i,:)F¯(i,:)1, i=1,2,,d.

It is noted that ρ:=max1idπimin1idπi., where π=F¯e, which can be used to characterize the equilibrium of the stationary distribution of the Markov model. The smaller the value of ρ, the more equilibrium the stationary distribution of the Markov model has. Otherwise, there is more non-equilibrium. Compared with Example 3.1, Example 3.2 can be seen as a Markov model with a “non-equilibrium” stationary distribution. Fig.3 and Fig.4 show the computational results of the four methods for Example 3.2, where γ = 0.1. For the estimation error of the frequency matrix, Algorithm 2.2 does not significantly differ from spectral estimation and is lower than maximum likelihood estimation; for the estimation error of probability matrix, Algorithm 2.2 and rank-constrained maximum likelihood estimation are lower than spectral estimation and maximum likelihood estimation; for the estimation error of the characteristic subspace, rank-constrained maximum likelihood estimation is optimal. Furthermore, in order to examine the numerical performance of spectral estimation, rank-constrained maximum likelihood estimation and Algorithm 2.2 in the Markov model’s stationary distribution shifting from non-equilibrium to equilibrium, we let γ = 0.1,0.2,,1 (beta (1, 1) is uniform distribution on [0, 1]) in Example 3.2.

Fig.5 shows the computational results of three three methods. It can be seen that when γ is relatively small, rank-constrained maximum likelihood estimation is optimal; when γ gradually increases, the stationary distribution of the Markov model tends to equilibrium, with the differences between the three methods decreasing and the estimation errors displaying a lowering trend. In fact, as the stationary distribution of the Markov model tends to equilibrium, let’s assume the constant α in Assumption 2.1 increases, it can be inferred that the estimation error of Algorithm 2.2 decreases.

In summary, for Markov models with equilibrium stationary distribution, Algorithm 2.2 can quickly obtain high-quality estimation of low-rank frequency and low-rank probability matrices. For Markov models with non-equilibrium stationary distribution, rank-constrained maximum likelihood estimation is more effective, but its computational complexity is too high. Therefore, Algorithm 2.2 is suitable for solving the learning problem of high-dimension and low-rank Markov models.

3.2 Movement trajectory analysis of taxis in Manhattan Island, New York City, USA

The yellow taxi operation trajectory dataset of Manhattan Island, New York City, USA made public in 2016 recorded the itinerary of 1.1×107 passengers. Our goal is to divide Manhattan Island into several regions based on this dataset, so that passengers in the same region have similar preferences for traveling to the same destination. By such partitioning, we can assist in optimizing the balance of urban transportation supply and demand and resource distribution.

Similar to [16], Manhattan Island is firstly subdivided into multiple 0.001° × 0.001° square grids according to longitude and latitude (40.70°−40.82° N, 73.93°−74.02° W). Then, each small grid is approximately taken as a state of Markov process, and the journey of passengers from one grid to another can be regarded as a state transition of the process. To eliminate interference, we only select grids that have been the starting point more than 1000 times as effective states. Then, low-rank spectral estimation Algorithm 2.2 is used to estimate the state transition probability matrix of the Markov process, and k-means clustering method to estimate the state transition probability matrix of the Markov process. The left singular subspace of the matrix is used for clustering-based partitioning. Fig.6-Fig.8 show the clustering results when k = r is 5, 6, and 7, respectively.

Considering that people have different travel purposes at different time periods, we divide the taxi operation dataset into three time periods, namely morning (06:00−11:59), afternoon (12:00−17:59), and evening (18:00−23:59). Under this time period division, the effective number of states for each time period can be obtained as 803, 1036, and 1130. We take k = r = 5 and use Algorithm 2.2 to estimate the probability matrix of state transition for different time periods and cluster them. The clustering results are shown in Fig.9 (morning), Fig.10 (afternoon), and Fig.11 (evening).

4 Summary

This paper proposes a low-rank spectral estimation method for the state transition frequency matrix and probability matrix estimation of Markov models. With the help of approximate projection algorithm for rank-constrained frequency matrix set, our method can obtain low-rank estimation of the real matrix, thus overcoming the limitation of the spectral estimation method proposed in [16] that cannot meet the low-rank condition. In addition, our method does not require matrix optimization, so the computational complexity is very low and can be applied to large low-rank Markov model learning. We have tested synthetic data and Manhattan Island taxi movement trajectory data in New York City, USA, and numerical experiments show that the low-rank spectral estimation method is effective. Next, we will study the relationship between the statistical error bound and the lower bound of the minimax error in low-rank spectral estimation algorithm.

References

[1]

Anderson T W, Goodman L A. Statistical inference about Markov chains. Ann Math Statist 1957; 28(1): 89–110

[2]

BenczúrA ACsalogányKSarlósT. On the feasibility of low-rank approximation for personalized PageRank. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. New York: ACM, 2005, 972–973

[3]

Benson A R, Gleich D F, Lim L-H. The spacey random walk: a stochastic process for higher-order data. SIAM Rev 2017; 59(2): 321–345

[4]

Bi S J, Pan S H. Error bounds for rank constrained optimization problems and applications. Oper Res Lett 2016; 44(3): 336–341

[5]

HuangQ QKakadeS MKongW HValiantG. Recovering structured probability matrices. In: 9th Innovations in Theoretical Computer Science, LIPIcs Leibniz Int Proc Inform, Vol 94. Wadern: Schloss Dagstuhl Leibniz-Zent Inform, 2018, 46

[6]

JainPMekaRDhillonI. Guaranteed rank minimization via singular value projection. In: Advances in Neural Information Processing Systems 23 (24th Annual Conference on Neural Information Processing Systems 2010), Vol 1. La Jolla, CA: NIPS, 2010, 937–945

[7]

Li G Y, Mordukhovich B S, Nghia T T A, Pham T S. Error bounds for parametric polynomial systems with applications to higher-order stability analysis and convergence rates. Math Program 2018; 168(1/2): Ser B, 313–346

[8]

Li Y H, Ren T Y, Shao P D, Song W P, Li Z Y, Gou Q Z. Development of driving cycle of bus in Xi’an City based on Markov chain. China Sciencepaper 2019; 14(2): 121–128

[9]

Liu Y, Kang C G, Gao S, Xiao Y, Tian Y. Understanding intra-urban trip patterns from taxi trajectory data. J Geographical Systems 2012; 14(4): 463–483

[10]

Negahban S, Oh S, Shah D. Rank centrality: ranking from pairwise comparisons. Oper Res 2017; 65(1): 266–287

[11]

Pang J-S. Error bounds in mathematical programming. Mathematical Programming 1997; 79(1/2/3): Ser B, 299–332

[12]

Sanders J, Proutière A, Yun S-Y. Clustering in block Markov chains. Ann Statist 2020; 48(6): 3488–3512

[13]

Tseng P. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math Program 2010; 125(2): Ser. B, 263–295

[14]

Xu X B, Li Z P. Research progress and prospects for application of reinforcement learning in operations research. Operations Research and Management Science 2020; 29(5): 227–239

[15]

Zhai Y, Liu J, Chen J, Xing X C, Du J, Li H, Zhu J. Analysis of dockless bike-sharing fleet size based on Markov chain. J Beijing Jiaotong Univ 2019; 43(5): 27–36

[16]

Zhang A R, Wang M D. Spectral state compression of Markov processes. IEEE Trans Inform Theory 2020; 66(5): 3202–3231

[17]

Zhou Z R, So A M-C. A unified approach to error bounds for structured convex optimization problems. Math Program 2017; 165(2): Ser A, 689–728

[18]

Zhu Z W, Li X D, Wang M D, Zhang A R. Learning Markov models via low-rank optimization. Oper Res 2022; 70(4): 2384–2398

RIGHTS & PERMISSIONS

Higher Education Press 2024

AI Summary AI Mindmap
PDF (5820KB)

524

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/