1. Department of Basic Courses, Guangzhou Maritime University, Guangzhou 510725, China
2. Department of Mathematics, South China University of Technology, Guangzhou 510640, China
lizhixiu1982@163.com
bishj@scut.edu.cn
Show less
History+
Received
Accepted
Published
Issue Date
Revised Date
2024-06-18
PDF
(5820KB)
Abstract
This paper proposes a low-rank spectral estimation algorithm of learning Markov model. First, an approximate projection algorithm for the rank-constrained frequency matrix set is proposed, and thereafter its local Lipschitzian error bound established. Then, we propose a low-rank spectral estimation algorithm for estimating the state transition frequency matrix and the probability matrix of Markov model by applying the approximate projection algorithm to correct the maximum likelihood estimation of the frequency matrix, and prove that there is only a multiplying constant difference in estimation errors between the low-rank spectral estimation and the maximum likelihood estimation under appropriate conditions. Finally, numerical comparisons with the prevailing maximum likelihood estimation, spectral estimation, and rank-constrained maximum likelihood estimation show that the low-rank spectral estimation algorithm is effective.
The learning of Markov model often arose from fields such as statistics, systems engineering, data science, and management [1-3, 14, 16]. For example, the potential urban transportation pattern could be revealed to further facilitate taxi resources allocation, by modeling the urban taxi movement trajectory into a Markov process, and conducting state clustering by estimating its state transition frequency matrix and probability matrix [3, 9]. Similarly, learning corresponding Markov model could also assist in solving the problem of designing the operation route of urban buses [8], personalized pagerank [2], and data sorting in e-commerce [10]. It should be pointed out Markov process originated from the above problems normally possessed a huge state space, but their state transition frequency matrix and probability matrix had been proven to be low-rank or approximately low-rank [3, 9, 16]. Although Markov process had a huge state space, but when the state transition frequency matrix satisfied low rankness, it was possible to accurately learn the process based on a short empirical trajectory [12, 16, 18]. For examples and applications of other low-rank Markov processes, readers, if interested, can refer to references [3, 5, 12, 15, 16] and others.
In recent years, many scholars have conducted research on the estimation of the state transition frequency matrix and probability matrix of low-rank Markov model [5, 16, 18]. As far as we know, existing estimation algorithms with polynomial time complex bound cannot guarantee that the estimation of frequency matrix and probability matrix satisfies the low-rank condition. For example, Zhang and Wang[16] proposed a spectral estimation method for learning low-rank Markov model by combining low-rank truncated singular value decomposition in maximum likelihood estimation of frequency matrix with non-negative projection, established the statistical error bound for the estimation, and proved that the difference between the statistical error bound and the lower bound of minimax error was one logarithmic factor of the Markov chain trajectory length. However, as mentioned in [16], due to the non-negative projection required by the spectral estimation algorithm, the final frequency and probability estimation matrices might not necessarily be low-rank. We have also verified this through numerical experiments, and found out that the spectral estimation matrices are hardly low-rank, but instead full-rank in some cases. This is also the main reason why the error of spectral estimation fails to be near optimal. To overcome this defect, [18] proposed the nuclear norm regularized maximum likelihood estimation and rank-constrained maximum likelihood estimation methods for state transition matrices, established the statistical error bounds for them, and proved that the statistical error bound and the lower bound of the minimax error only differed by one logarithmic factor of the Markov chain state space dimension, which was theoretically near optimal. It is noted that the optimal solution for the nuclear norm regularized problems may not necessarily satisfy the low-rank condition; the optimal solution of rank-constrained optimization problems can otherwise, but it is generally NP-hard to seek the optimal solution in computation. Although [18] designed a class of DC (difference of convex functions) programming algorithm to approximate the solution of rank-constrained maximum likelihood estimation problems. Anyway, nuclear norm regularized maximum likelihood estimation and rank-constrained maximum likelihood estimation require a large amount of computation, because singular value decomposition is needed in each step of the algorithm. Therefore, the estimation method raised in [18] is not suitable for large-scale Markov model learning. In addition, [5] designed a low-rank recovery algorithm for state transition matrices, but the estimation matrix of this method might not necessarily be non-negative, so it couldn’t be always a low-rank state transition matrix. Inspired by the above researches, this paper attempts to seek a method that can quickly obtain low-rank frequency matrix and probability matrix to learn a low-rank Markov model.
Another area related to this paper lies in the study of set error bounds. Error bound plays a core role in the theory and algorithm research of optimization, and the research on it always remained a focus and challenge in the field of optimization [4, 7, 11, 13, 17]. Reference [11] proved that the convex polyhedron set had a global Lipschitz error bound; for general convex feasible systems, Lipschitz error bounds only existed under certain constraint qualifications. Specifically, [11] pointed out that for general non-convex feasible systems, the global error bound, even of Hölder type, could hardly be established. Existing results mainly established local Hölder error bounds for subanalytic systems and polynomial systems [7,11]. As a special class of non-convex feasible system, the research on the error bounds of rank-constrained sets has been limited. For a feasibility system constituting the intersection of a matrix closed convex set and rank-constrained set, when its multivalued function satisfied the condition of calmness at the origin, [4] established the local Lipschitz error bound for estimation of distance from any matrix in the closed convex matrix to the rank-constrained feasibility system. However, the difficulty in verifying the calmness of multivalued function is almost the same with establishing error bound. Therefore, no solution has been found for the error bound of general matrix rank-constrained feasibility system.
The main work of this paper is to propose a low-rank spectral estimation algorithm for learning Markov model based on the approximate projection algorithm of the rank-constrained frequency matrix set. Given a positive integer , this paper defines the rank-constrained frequency matrix set as follows:
where rank(Z) stands for the rank of Z, represents the d-dimensional column vector with all elements as 1, matrix indicates that its elements , . Obviously, , so set is non-empty and bounded.
First, this paper will establish the local Lipschitz error bound of set , that is to give estimation of distance from any frequency matrix to set , where is constant, indicates . To be specific, a low-rank spectral estimation algorithm for the state transition frequency matrix and probability matrix of the Markov model is proposed. The state transition frequency matrix and probability matrix obtained by our algorithm can satisfy the low-rank condition, thus overcoming the limitation that the spectral estimation in [16] is not a low-rank state transition matrix. Besides, our method does not require solving the optimization problem, so it has a low computational complexity compared to the rank-constrained maximum likelihood estimation [18]. And we further prove that the estimation error between low-rank spectral estimation and maximum likelihood estimation only differs in one multiplying constant under the condition that the probability of each state is equal to or greater than α. Finally, the proposed method is numerically compared with the maximum likelihood estimation [1], spectral estimation [16], and rank-constrained maximum likelihood estimation [18], showing that our method is effective; in addition, we combine the low-rank spectral estimation algorithm and k-means clustering algorithm into the analysis of taxi trajectory in Manhattan Island, New York City, USA.
At the end of this section, we present the main notations of this paper.
(1) Denote as a matrix with all elements as 1; as a column vector with all elements as 1.
(2) For any matrix , define as an infinite norm, is l1-norm, is F(robenius) norm, lth row vector of Z is written as , given , represents , . Define the distance from to l1-norm of as
(3) Given , define l1-norm as , given , represents .
(4) For any matrix , define , where is the ith maximum singular value of Z, and corresponds to left and right singular vector of , where . As is known to all, .
(5) Define , where is constant.
2 Algorithm and its properties
In this section, we will propose a low-rank spectral estimation algorithm for the state transition frequency matrix and probability matrix of a Markov model. We consider a discrete-time Markov process with states , and assume that its state transition probability matrix is and frequency matrix is , with satisfied low-rank condition . It is aimed to estimate the matrices and through the Markov chain with a trajectory length of n + 1. Firstly, we present the assumptions in this section.
Assumption 2.1 is an ergodic Markov chain and there exists a constant , such that .
Assumption 2.1 is also required for the theoretical analysis in [16]. Let’s mark the stationary distribution of Markov chain as . It follows from [16] that , where indicates the diagonal matrix of the ith diagonal element πi(). Since , , so Assumption 1.1 is equivalent to . When is relatively small, the probability of some states appearing in is extremely low, posing a challenge to the estimation of state transition frequency matrix and probability matrix.
2.1 Approximate projection algorithm of rank-constrained frequency matrix set
As is widely known, for any matrix , it was NP-hard to compute its projection in [6]. In the following, we will propose a fast approximate projection algorithm for any in , and establish its local Lipschitz error bound.
Algorithm 2.1 (approximate projection algorithm of in )
Input: , , , , iteration step T.
(S1) Compute . If , output , end; or turn to (S2).
(S2) If , output , end; or turn to (S3).
(S3) Set , k = 1, , where , .
Compute
while
if
;
;
else
;
end if
k = k + 1;
end while
Output , end.
Lemma 2.1For any , output matrixin Algorithm 2.1 satisfiesand .
Proof If , then and , so the conclusion holds. If 且, then and , so the conclusion also holds.
Next, let’s assume and . It follows from (S3) in Algorithm 2.1 that is the output in the end and can be expressed as , where and . Since 0, , and . Hence, . It is noted that all in (S3), so we can get 0.
The following proposition establishes the local Lipschitz error bound of rank-constrained frequency matrix set, namely estimation of distance from any matrix to l1-norm of set .
Proposition 2.1For any , let’s assume the output of Algorithm 2.1 is . The error bound inequation belowis obtained:
Proof Given any . It follows from Lemma 2.1 that . is l1-norm distance, so . We will try to demonstrate from three cases as below.
Case 1 . Since , . So
Thus , and inequation (2.1) holds.
Case 2 and 且. Then . Based on , we can get
Thus, , and inequation (1.1) holds.
Case 3 and . If , then
Therefore, inequation (2.1) is established. Then we assume . It follows through Algorithm 2.1 (S3) that , , and combining , it can be seen that element of F0 increases with k. Hence, is a non-descending sequence while a non-increasing sequence, and
Note that , , so , . Thus
where F0 is the final output of Algorithm 2.1 (S3). By using , , and inequation (2.2), it follows
Therefore, inequation (2.1) is established.
Note 2.1 (i) Although the right side of the error bound inequation in Proposition 2.1 contains dimensional factor d, namely , when is a dense matrix with relatively uniform elements. In the future, we will conduct researches on derived from frequency matrix estimation of low-rank Markov model.
(ii) It is noted that when μ = n, T = 1, the right side of the error bound inequation in Proposition 2.1 is minimal. This is because we scaled up to in inequation (2). In fact, when , , so w is used to depict the weights of each column in matrix F, and it can uplift the quality of approximate projection if applied in correcting F0.
2.2 Low-rank spectral estimation algorithm
Define , , , namely the sum of transition times from state Si to Sjis nij. Define matrices and as follows:
[1] has been proved that and are the maximum likelihood estimation for and , respectively, and featured by strong consistency. Yet, the maximum likelihood estimation hasn’t utilized the low-rank information in . Considering the low-rankness of matrices and to be estimated, we present the low-rank spectral estimation algorithm of Markov process.
(S1) Compute the corrected maximum likelihood estimation :
;
For i = 1 to d
if
;
elseif
;
else
;
end if
end for
(S2) Use Algorithm 2.1 to compute the estimation matrix of with F1, r, κ, μ, η, T as the input.
(S3) Computer the estimation matrix of , where
The following Theorem 2.1(a) shows that Algorithm 2.2 (S1) results in . It follows through Lemma 2.1 that is a low-rank frequency matrix and , thereby the generated by Algorithm 2.2 (S3) is meaningful as a low-rank state transition frequency matrix, namely . Therefore, the estimation matrices and in Algorithm 2.2 meet the low-rank condition, overcoming the defect of spectral estimation [16]. In particular, Algorithm 2.2 doesn’t need to solve matrix optimization, so there is little computational load. The following is the main conclusions of this section.
Theorem 2.1Let’s assume thatF1, , are generated by Algorithm 2.2. If Assumption 2.1 holds, then thefollowing conclusions are tenable.
(a) and
(b) ;
(c) .
Proof (a) Fix any index . If , obviously and , . If , then . By using and
m+ > m−. Hence, if , , and , . Through direct verification it can be obtained that . So . Based on Assumption 2.1 and definition of m+, m−, it follows
Similarly, we can prove . With Assumption 2.1, . Based on the above,
(b) Based on the definition of and , it follows that . Therefore, we can get after calculation based on Proposition 2.1 and result of (a) as well as relationship between l1-norm and F(robenius) norm.
(c) It can be seen that , . With [16, Lemma 2] and , , we can get
With the result of (b), . □
Note 2.2 (i) Theorem 2.1 shows that the estimation error between Algorithm 2.2 and the maximum likelihood estimation only differs by one multiplying constant. It is noted that the right side of equation in Theorem 2.1 (b) and (c) contains dimensional factor d of state space. That is because we directly scaled up to in inequation (2.3). In fact, is a dense matrix with relatively uniform elements, so , further improving the conclusion in Theorem 2.1 (b) and (c).
(ii) With [16, Lemma7], it can be seen that there exist constants and , such that the probability for to establish is at least , where stands for the spectral norm of matrix, for the infinite norm of vector, and for - mixing time of Markov chain. In this way, we can work out the statistical error bound of low-rank spectral estimation matrix generated by Algorithm 2.2 by using inequation , in combination with Theorem 2.1.
3 Numerical experiments
In this section, we first compare the numerical performance of low-rank spectral estimation Algorithm 2.2 with spectral estimation [16], maximum likelihood estimation [1] and rank-constrained maximum likelihood estimation [18] through artificially synthesized data experiments. Then, we use Algorithm 2.2 combined with k-means clustering algorithm to analyze the public dataset of taxi trajectory in Manhattan Island, New York City, USA and reveal its potential transportation patterns. All numerical tests in this section have been conducted on laptops configured with Inter (R) Core (TM) i7-8550U CPU @ 1.80GHz and 8.00GB RAM by running Matlab R2019a under the Windows 10 system.
3.1 Synthetic data experiments
Firstly, the true frequency matrix and probability matrix with a rank of r are generated through Examples 3.1 and 3.2 below. Then, based on and , a Markov chain , where , with a trajectory length of n = round(krd log(d)) is simulated through a computer.
In Algorithm 2.2, we assign , κ = 0.001, , , T = 50. Let’s assume that F and P are estimation of and , respectively, and we measure the estimation performance with the following three values:
where, , are the matrices composed of the left and right singular vectors corresponding to the foregoing r maximum singular values of P, , are the matrices composed of the left and right singular vectors corresponding to the foregoing r maximum singular values of , and can be regarded as the estimation error in the characteristic subspace. Since the rank-constrained maximum likelihood estimation [18] only provided an estimation of the state transition probability matrix , we only compare ηp and .
Example 3.1 Generate matrix , with independent elements that obey standard normal distribution. Consider
Type I: , where elements of , are absolute values of U0, V0 elements;
Type II: , where , is defined as , .
Define and , , .
Fig.1 and Fig.2 present the computation results of four methods for Example 3.1. In terms of the estimation error of the frequency matrix, Algorithm 2.2 is higher than spectral estimation and maximum likelihood estimation. In terms of the estimation error of the probability matrix, when and are given by Type I, Algorithm 2.2 is optimal, followed by rank-constrained maximum likelihood estimation. When and are given by Type II, Algorithm 2.2 has similar estimation errors to rank-constrained maximum likelihood estimation and spectral estimation, both of which are lower than maximum likelihood estimation. For the estimation error of characteristic subspace, Algorithm 2.2, spectral estimation, and rank-constrained maximum likelihood estimation enjoy advantages over maximum likelihood estimation. When it comes to computational time, Algorithm 2.2, spectral estimation and maximum likelihood estimation are all less than rank-constrained maximum likelihood estimation because the former three methods do not require solving optimization problem while the latter requires using DC programming to solve a matrix rank-constrained optimization problem.
Example 3.2 Generate matrices , with independent elements that obey standard normal distribution and diagonal matrix , where diagonal elements obey distribution beta(γ, γ). Consider
Type I: , where elements of , are absolute values of U0, V0 elements;
Type II: , where , is defined as , .
Define and , , .
It is noted that , where , which can be used to characterize the equilibrium of the stationary distribution of the Markov model. The smaller the value of ρ, the more equilibrium the stationary distribution of the Markov model has. Otherwise, there is more non-equilibrium. Compared with Example 3.1, Example 3.2 can be seen as a Markov model with a “non-equilibrium” stationary distribution. Fig.3 and Fig.4 show the computational results of the four methods for Example 3.2, where γ = 0.1. For the estimation error of the frequency matrix, Algorithm 2.2 does not significantly differ from spectral estimation and is lower than maximum likelihood estimation; for the estimation error of probability matrix, Algorithm 2.2 and rank-constrained maximum likelihood estimation are lower than spectral estimation and maximum likelihood estimation; for the estimation error of the characteristic subspace, rank-constrained maximum likelihood estimation is optimal. Furthermore, in order to examine the numerical performance of spectral estimation, rank-constrained maximum likelihood estimation and Algorithm 2.2 in the Markov model’s stationary distribution shifting from non-equilibrium to equilibrium, we let γ = (beta (1, 1) is uniform distribution on [0, 1]) in Example 3.2.
Fig.5 shows the computational results of three three methods. It can be seen that when γ is relatively small, rank-constrained maximum likelihood estimation is optimal; when γ gradually increases, the stationary distribution of the Markov model tends to equilibrium, with the differences between the three methods decreasing and the estimation errors displaying a lowering trend. In fact, as the stationary distribution of the Markov model tends to equilibrium, let’s assume the constant α in Assumption 2.1 increases, it can be inferred that the estimation error of Algorithm 2.2 decreases.
In summary, for Markov models with equilibrium stationary distribution, Algorithm 2.2 can quickly obtain high-quality estimation of low-rank frequency and low-rank probability matrices. For Markov models with non-equilibrium stationary distribution, rank-constrained maximum likelihood estimation is more effective, but its computational complexity is too high. Therefore, Algorithm 2.2 is suitable for solving the learning problem of high-dimension and low-rank Markov models.
3.2 Movement trajectory analysis of taxis in Manhattan Island, New York City, USA
The yellow taxi operation trajectory dataset of Manhattan Island, New York City, USA made public in 2016 recorded the itinerary of 1.1×107 passengers. Our goal is to divide Manhattan Island into several regions based on this dataset, so that passengers in the same region have similar preferences for traveling to the same destination. By such partitioning, we can assist in optimizing the balance of urban transportation supply and demand and resource distribution.
Similar to [16], Manhattan Island is firstly subdivided into multiple 0.001° × 0.001° square grids according to longitude and latitude (40.70°−40.82° N, 73.93°−74.02° W). Then, each small grid is approximately taken as a state of Markov process, and the journey of passengers from one grid to another can be regarded as a state transition of the process. To eliminate interference, we only select grids that have been the starting point more than 1000 times as effective states. Then, low-rank spectral estimation Algorithm 2.2 is used to estimate the state transition probability matrix of the Markov process, and k-means clustering method to estimate the state transition probability matrix of the Markov process. The left singular subspace of the matrix is used for clustering-based partitioning. Fig.6-Fig.8 show the clustering results when k = r is 5, 6, and 7, respectively.
Considering that people have different travel purposes at different time periods, we divide the taxi operation dataset into three time periods, namely morning (06:00−11:59), afternoon (12:00−17:59), and evening (18:00−23:59). Under this time period division, the effective number of states for each time period can be obtained as 803, 1036, and 1130. We take k = r = 5 and use Algorithm 2.2 to estimate the probability matrix of state transition for different time periods and cluster them. The clustering results are shown in Fig.9 (morning), Fig.10 (afternoon), and Fig.11 (evening).
4 Summary
This paper proposes a low-rank spectral estimation method for the state transition frequency matrix and probability matrix estimation of Markov models. With the help of approximate projection algorithm for rank-constrained frequency matrix set, our method can obtain low-rank estimation of the real matrix, thus overcoming the limitation of the spectral estimation method proposed in [16] that cannot meet the low-rank condition. In addition, our method does not require matrix optimization, so the computational complexity is very low and can be applied to large low-rank Markov model learning. We have tested synthetic data and Manhattan Island taxi movement trajectory data in New York City, USA, and numerical experiments show that the low-rank spectral estimation method is effective. Next, we will study the relationship between the statistical error bound and the lower bound of the minimax error in low-rank spectral estimation algorithm.
Anderson T W, Goodman L A. Statistical inference about Markov chains. Ann Math Statist1957; 28(1): 89–110
[2]
BenczúrA ACsalogányKSarlósT. On the feasibility of low-rank approximation for personalized PageRank. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. New York: ACM, 2005, 972–973
[3]
Benson A R, Gleich D F, Lim L-H. The spacey random walk: a stochastic process for higher-order data. SIAM Rev2017; 59(2): 321–345
[4]
Bi S J, Pan S H. Error bounds for rank constrained optimization problems and applications. Oper Res Lett2016; 44(3): 336–341
[5]
HuangQ QKakadeS MKongW HValiantG. Recovering structured probability matrices. In: 9th Innovations in Theoretical Computer Science, LIPIcs Leibniz Int Proc Inform, Vol 94. Wadern: Schloss Dagstuhl Leibniz-Zent Inform, 2018, 46
[6]
JainPMekaRDhillonI. Guaranteed rank minimization via singular value projection. In: Advances in Neural Information Processing Systems 23 (24th Annual Conference on Neural Information Processing Systems 2010), Vol 1. La Jolla, CA: NIPS, 2010, 937–945
[7]
Li G Y, Mordukhovich B S, Nghia T T A, Pham T S. Error bounds for parametric polynomial systems with applications to higher-order stability analysis and convergence rates. Math Program2018; 168(1/2): Ser B, 313–346
[8]
Li Y H, Ren T Y, Shao P D, Song W P, Li Z Y, Gou Q Z. Development of driving cycle of bus in Xi’an City based on Markov chain. China Sciencepaper2019; 14(2): 121–128
[9]
Liu Y, Kang C G, Gao S, Xiao Y, Tian Y. Understanding intra-urban trip patterns from taxi trajectory data. J Geographical Systems2012; 14(4): 463–483
[10]
Negahban S, Oh S, Shah D. Rank centrality: ranking from pairwise comparisons. Oper Res2017; 65(1): 266–287
[11]
Pang J-S. Error bounds in mathematical programming. Mathematical Programming1997; 79(1/2/3): Ser B, 299–332
[12]
Sanders J, Proutière A, Yun S-Y. Clustering in block Markov chains. Ann Statist2020; 48(6): 3488–3512
[13]
Tseng P. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math Program2010; 125(2): Ser. B, 263–295
[14]
Xu X B, Li Z P. Research progress and prospects for application of reinforcement learning in operations research. Operations Research and Management Science2020; 29(5): 227–239
[15]
Zhai Y, Liu J, Chen J, Xing X C, Du J, Li H, Zhu J. Analysis of dockless bike-sharing fleet size based on Markov chain. J Beijing Jiaotong Univ2019; 43(5): 27–36
[16]
Zhang A R, Wang M D. Spectral state compression of Markov processes. IEEE Trans Inform Theory2020; 66(5): 3202–3231
[17]
Zhou Z R, So A M-C. A unified approach to error bounds for structured convex optimization problems. Math Program2017; 165(2): Ser A, 689–728
[18]
Zhu Z W, Li X D, Wang M D, Zhang A R. Learning Markov models via low-rank optimization. Oper Res2022; 70(4): 2384–2398
RIGHTS & PERMISSIONS
Higher Education Press 2024
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.