The large sample property of the iterative generalized least squares estimation for hierarchical mixed effects model

Chunyu WANG , Maozai TIAN

Front. Math. China ›› 2023, Vol. 18 ›› Issue (5) : 327 -339.

PDF (631KB)
Front. Math. China ›› 2023, Vol. 18 ›› Issue (5) : 327 -339. DOI: 10.3868/s140-DDD-023-0023-x
RESEARCH ARTICLE

The large sample property of the iterative generalized least squares estimation for hierarchical mixed effects model

Author information +
History +
PDF (631KB)

Abstract

In many fields, we need to deal with hierarchically structured data. For this kind of data, hierarchical mixed effects model can show the correlation of variables in the same level by establishing a model for regression coefficients. Due to the complexity of the random part in this model, seeking an effective method to estimate the covariance matrix is an appealing issue. Iterative generalized least squares estimation method was proposed by Goldstein in 1986 and was applied in special case of hierarchical model. In this paper, we extend the method to the general hierarchical mixed effects model, derive its expressions in detail and apply it to economic examples.

Graphical abstract

Keywords

Hierarchical model / iterative generalized least squares estimation / variance-covariance components / maximum likelihood estimation

Cite this article

Download citation ▾
Chunyu WANG, Maozai TIAN. The large sample property of the iterative generalized least squares estimation for hierarchical mixed effects model. Front. Math. China, 2023, 18(5): 327-339 DOI:10.3868/s140-DDD-023-0023-x

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

For the hierarchically structured data, an important feature is the correlation between the data within the same level. For example, if we examine the math scores of students from different classes, it is clear that there should be some correlation between the scores of students in the same class, and this intra-level correlation is something that we must consider in the subsequent analysis. At this point the mutual independence of the random terms assumed by the traditional regression model no longer holds. The hierarchical model reflects the intra-level correlation of the data itself by introducing the same random term within the same level [7, 8]. As an example, take the simplest stratified model (null model):

yij=β+uj+eij,j=1,2,,m,i=1,2,,nj,

where i,j are the subscripts of the first and second level, respectively, and the random terms eij,uj are mutually independent and satisfy E(eij)=E(uj)=0, var(eij)=σe2, var(uj)=σu2. In this model, different individuals in the same level have the same random term uj, and the correlation coefficient between them is

ρ=cov(yij,yij)var(yij)var(yij)=σu2σu2+σe2.

It can be seen that the weight of σu2 in the total variance reflects the intra-level correlation. Thus, estimating the covariance array for the hierarchical model is an inescapable problem. The well-known and commonly used estimation methods include maximum likelihood estimation (MLE), restricted maximum likelihood estimation (RMLE), and empirical Bayesian estimation (EB). These methods have some shortcomings: MLE and RMLE involve the differentiation of vectors, which can be difficult to handle when the random part of the model becomes complex; EB has different effects on the results when estimating the prior distribution using empirical data depending on the selection of the estimator [3]. In recent years, scholars have made new explorations and attempts around the estimation method of covariance array. Lele and Taper [5] (2002) proposed the method of compound likelihood (MCL), which reduces the complexity of the algorithm; Wang and Yin [10] (2002) proposed the spectral decomposition method for the hierarchical model of a class of balanced data to obtain an explicit solution for the variance components.

The iterative generalized least squares estimation used in this paper was originally proposed by Goldstein [2] (1986) for the variance component model, i.e.,

Yij=β0+l=1rβl,jxl,ij+l=1pγlzl,j+(uj+eij).

In this model, the random part is independent with respect to the explanatory variable x. Here, we try to derive a specific expression for parameter estimation based on iterative generalized least squares for a more general hierarchical model containing random effects (i.e., random terms shaped as μ0j+u1jxij+eij). Then we will perform the proof of the relevant properties, and show the effect of estimation through simulations.

2 Model

Consider a general two-level mixed-effects model:

{yij=α0j+α1jxij+eij,α0j=β0+u0j,α1j=β1+u1j,

where i is the subscript of the first level, i=1,2,,nj, j is the subscript of the second level, j=1,2,,m. Specifically, we can consider the second level as m schools, the first level as the students in each school. The jth school has nj students. In the model, yij is the response variable, xij is the explanatory variable, and eij,u0j,u1j are the random terms that satisfies:

E(u0j)=E(u1j)=E(eij)=0,var(u0j)=σu02,var(u1j)=σu12,cov(u0j,u1j)=σu01,var(eij)=σe2,

where eij and u0j,u1j are independent. Thus, we can obtain the merged form:

yij=β0+β1xij+(u0j+xiju1j+eij).

In this way, yij is decomposed into the sum of a fixed part β0+β1xij and a random part u0j+xiju1j+eij. Thereby, given xij, the variance of yij is

var(yij|xij)=σu02+2xijσu01+xij2σu12+σe2.

The covariance between different individuals in the same school is

cov(yi1j,yi2j|xij)=σu02+(xi1j+xi2j)σu01+xi1jxi2jσu12.

Thus, for the given jth school, the covariance array is

Vj=XjΩXjT+σe2Inj,

where

Xj=(1x1j1xnjj),Ω=(σu02σu01σu01σu12).

In model (2.1), we are interested in the regression coefficients β=(β0,β1)T and the covariance array V=diag(V1,V2,,Vm). Notice that in (2.3), each piece Vj of the covariance array V can be represented by the variance-covariance component θ=(σu02,σu01,σu12,σe2)T. Therefore, the actual needed parameter in the model is (βT,θT). In the following, we give iterative generalized least squares estimates of the parameters.

3 Iterative generalized least squares estimation

Rewrite (2.2) in the form of matrix, we have

Y=Xβ+ε,

where X is the design array, X columns are full rank, β=(β0,β1)T, and ε is the error vector. Given covariance array V and V is non-singular, the generalized least squares estimate of β is as follow:

β^=(XTV1X)1XTV1Y.

The estimated covariance is

cov(β^)=(XTV1X)1.

On the other hand, since E(εεT)=V,ε=YXβ, we can build the regression model for the covariance matrix V when β is known:

εεT=V+R,

where the covariance array V=diag(V1,V2,,Vm), R is the error array. Any piece Vj of V can be expressed as a linear form of the unknown parameter θ:

Vj=Jnjσu02+Ajσu01+Bjσu12+Injσe2,

where Jnj is an nj dimensional matrix with all elements equal to 1 and Inj is an nj dimensional unit array,

Aj=(2x1jx1j+x2jx1j+xnjjx1j+x2j2x2jx2j+xnjjx1j+xnjjx2j+xnjj2xnjj),Bj=(x1j2x1j+x2jx1j+xnjjx1j+x2jx2j2x2j+xnjjx1j+xnjjx2j+xnjjxnjj2).

Substituting (3.4) into (3.3) and doing matrix vec() operator, we get

Y(ε112ε11ε21εnmm2)=(111)σu02+(2x11x11+x212xnmm)σu01+(x112x11x21xnmm2)σu12+(111)σe2+vec(R).

The above equation can be organized as the following form:

Y=Xθ+R,

where X is the design matrix, Xcolumns be full rank, and R is the error vector. In this way, we transform the problem of estimating the covariance array into generalized least squares estimation of estimating the regression coefficient θ=(σμ02,σμ01,σμ12,σe2)T in the linear regression model (3.5). Thus, the generalized least squares estimate of θ is

θ^=(XTVX)1XTVY,

where V is the covariance array of Y. It can be shown that in the normal case (its proof sees the work of Searle et al. [6]). There is

V=(VV)(I+K),

where denotes the Kronecker product, K is the vector permutation matrix. Further, substituting (3.7) into (3.6), using the Kronecker product and the relevant properties of vector permutation matrix [4, 9], the estimate of θ in the normal case can be obtained:

θ^=(XT(V1V1)X)1XT(V1V1)Y.

Estimated covariance array (see the appendix of Section 7 for derivation details):

cov(θ^)=2(XT(V1V1)X)1.

However, in practice, both V and β are unknown. Therefore, in (3.2), we replace the true V with the estimate V^ of the current covariance array to find β^; similarly, in (3.8), we replace the true β, V with the current estimates β^, V^ to find Y^,V^, respectively, and then obtain β^. This iterative update until convergence, which is the idea of iterative generalized least squares. In summary, the steps of iterative generalized least squares estimation can be summarized as follows.

Given the initial value θ^(0), V^(0) can be obtained according to equation (3.4), and given the convergence threshold α, the t+1 iteration as follows.

Step 1: β^(t+1)=(XT(V^(t))1X)1XT(V^(t))1Y;

Step 2: θ^(t+1)={XT[(V^(t))1(V^(t))1]X}1XT[(V^(t))1(V^(t))1]Y^(t+1), where Y^(t+1)=vec{(YXβ^(t+1))(YXβ^(t+1))T};

Step 3: The iteration ends when β^(t+1)β^(t)β^(t)<α and θ^(t+1)θ^(t)θ^(t)<α hold simultaneously; otherwise, go to the step 1; where is the infinite number, and for any given P-dimensional vector x, x=max{|x1|,|x2|,,|xp|}.

It can be seen that the iterative generalized least squares estimation achieves the estimation of regression coefficients as well as the covariance array, which is achieved by the iterative approach based on the generalized least squares estimation. Here, the essential idea about the estimation of the covariance array is to use the sample residual array εεT to estimate the covariance array V. For the hierarchical mixed effects model in this paper, the covariance array can be fully represented by the variance-covariance components on intra- and inter-level. Therefore, estimating the covariance array is equivalent to estimating the regression coefficients θ in the linear regression model (3.5).

4 Properties of iterative generalized least squares estimation

Proposition 1  For model (2.1) and its matrix form (3.1), (3.5), if Y obeys the multivariate normal distribution and the design array X, X are column full rank. Then for parameters β,θ, the iterative generalized least squares estimation is equivalent to the maximum likelihood estimation.

Proof For the n-element normal distribution YNn(Xβ,V), the log-likelihood function is

l=n2ln(2π)12ln(V)12(YXβ)TV1(YXβ).

For the regression coefficient β, the estimated equation is

lβ=12(2XTV1Y+2XTV1Xβ)=0.

Equation (3.2) is obtained by rectifying. It can be seen that the maximum likelihood estimation for θ is equivalent to the iterative generalized least squares estimation.

The estimation of the variance component θ is considered below. The log-likelihood function (4.1) is rewritten as the following equation:

l=n2ln(2π)12ln(V1)12tr(V1S),

where S=(YXβ)(YXβ)T. Since θ is the parameter in the covariance array V, the estimated equation for θ based on the maximum likelihood estimation is

lθ=12tr(VV1θ)12θtr(V1S)=0.

In the iterative generalized least squares estimation, θ^ is obtained from equation (3.6), which can be obtained by minimizing

(YXθ)TV(YXθ)=[vec(SV(θ))]TVvec(SV(θ)).

When YNn(Xβ,V) holds, minimizing equation (4.5) is equivalent to minimizing

[vec(SV(θ))]T(V1V1)vec(SV(θ)).

It should be noted that we use different symbols V(θ) and V to characterize whether the covariance array is related to the estimated parameter θ. The former is the expression before the straightening operation on Xθ, which is related to θ, and the latter is the true covariance array in the model, which is treated as a constant in the derivation process below. Let equation (4.6) be G. From the relevant properties of matrix vectorization operation and trace operation, we get

G=tr[V1(SV(θ))]2.

Taking the partial derivative of the above equation with respect to θ, we get

Gθ=tr[V1(SV(θ))V1(SV(θ))]θ=2tr[V1SV1V(θ)θ+V1V(θ)V1V(θ)θ]=2tr[SV1(θ)θ+V(θ)V1(θ)θ].

Setting the partial derivative of the above equation equal to 0, the estimated equation for θ is obtained:

tr(SV1(θ)θ)=tr(V(θ)V1(θ)θ).

This is the same as (4.4). Therefore, for θ, the maximum likelihood estimate is equivalent to the iterative generalized least squares estimate.

From the above conclusion, it can be deduced that the property of maximum likelihood estimation is appropriate for iterative generalized least squares estimation in the normal case. We know that for the variance, its maximum likelihood estimate is biased, for example, when YN(Xβ,V), the maximum likelihood estimates for σ2 is σ^ML2=1n(YXβ^)T(YXβ^) and E(σ^ML2)=nrk(X)nσ2σ2. Therefore, the estimate of the variance obtained using iterative generalized least squares is also biased.

Regarding the large-sample property, we still consider firstly the large-sample property of maximum likelihood estimation, and then naturally transition to iterative generalized least squares estimation in the above equivalence. For the hierarchical model, since the second level samples y1,y2,,ym are independent of each other, the likelihood functions are as follows:

ϕ(γ)=j=1mfj(yj,γ),

where yj=(y1j,y2j,,ynjj),j=1,2,,m,γ=(βT,θT)1×kT, fj is the nj distribution density function. It should be noted that the samples y1,y2,,ym are independent but not equally distributed, depending on a common parameter vector γ. For the maximum likelihood estimate γ^ML in this case, Bradley and Gart [1] proves that γ^ML is consistent and asymptotically normal if the following assumptions hold.

Hypothesis I (i) For any yjRj,γΩ, there exist lnfjγr,2lnfjγrγs,3lnfjγrγsγt, r,s,t=1,2,,k; j=1,2,,m.

Hypothesis I (ii) For any density function fj, and any yjRj,γΩ, there are

|fjγr|<Fjr(yj),|2fjγrγs|<Fjrs(yj),|3lnfjγrγsγt|<Hjrst(yj),

where Fjr(yj),Fjrs(yj) are integrable in Rj, and

RjHjrst(yj)fjdyj<Mj;0<Mj<,r,s,t=1,2,,k;j=1,2,,m.

Hypothesis II (i) For the density function column {fj}j=1m, we have

j=1mD1jfjdyj=o(1),j=1mD2j{lnfjγr}fjdyj=o(n2),

where D1j={yjRj:|lnfjγr|>n},D2j={yjRj:|lnfjγr|<n},r=1,2,,k.

Hypothesis II (ii) For the density function column {fj}j=1m, we have

j=1mD3jfjdyj=o(1),j=1mD4j{lnfjγrγs}2fjdyj=o(n2),

where D3j={yjRj:|2lnfjγrγs|>n},D4j={yjRj:|2lnfjγrγs|<n},r=1,2,,k; andlimm1mj=1mRjlnfjγrlnfjγsfjdyj exists, r,s=1,2,,k.

Hypothesis II (iii) For Hjrst(yj),Mj, we have

j=1mD5jfjdyj=o(1),j=1mD6jHjrst2(yj)fjdyj=o(n2),

where D5j={yjRj:Hjrst(yj)>n},D6j={yjRj:Hjrst(yj)<n},r,s,t=1,2,,k; and 1nj=1mMj<M, M is a positive constant.

Hypothesis III

limm1mj=1mD7jr=1k{lnfjγr}2fjdyj=0,

where D7j={yjRj:[r=1k(lnfjγr)2]12>εm,ε>0}.

Theorem 1  Let Yobey a multivariate normal distribution, γ^ be an iterative generalized least squares estimate of the parameter vector γ, and γ0 be the true value. Then

(1) Under the conditions that hypothesis I, II hold, γ^ has consistence: γ^Pγ0,m;

(2) Under the conditions that hypothesis I, II, and III, γ^ is asymptotically normal: m(γ^γ0)LN(0,J01), where J0=(limm1mj=1mE(lnfjγrlnfjγs)|γ=γ0).

Proof The proof of the theorem can be obtained directly from the large-sample property of γ^ML given by Bradley and Gart [1] (1962) under the guarantee of the above equivalence proposition.

5 Simulation

To test the estimation effect of the iterative generalized least squares method, we first simulate model (2.1). Given the regression coefficients β=(β0,β1)T=(10,2)T, variance-covariance components θ=(σu02,σu01,σu12,σe2)T=(1.2,0.5,1,1)T, define Ω the same as in equation (2.3). xij is generated from the standardized normal distribution. The random term (u0j,u1j)T of the second level is generated from two-dimensional normal distribution N(0,Ω), i.e., (u0j,u1j)TiidN(0,Ω), j=1,2,,m. The random term eij of the first level is generated from one-dimensional normal distribution N(0,σe2), i.e., eijiidN(0,σe2), i=1,2,,nj. Taking m=100, nj is generated from the Poisson distribution with average of 25. Thus, yij is obtained according to yij=β0+β1xij+(u0j+u1jxij+eij). For illustration purposes, as above, we consider the first and second levels of the model here as the student level and the school level, respectively, so that yij is the response variable corresponding to the ith student in jth school. More intuitively, we plot the scatter plots for two of the schools as follows.

As can be seen in Fig.1, the intercept and regression coefficients are not the same for each school because different schools correspond to different school-level random terms, and are perturbed based on β0,β1. We estimated the regression coefficients β and variance components θ using iterative generalized least squares, and the estimation results are shown in Tab.1.

It can be seen that the estimated values of the parameters deviate less from the true values, and the standard deviations of the estimates are also small, which indicates that the parameter estimates obtained by using the iterative generalized least squares method fluctuate in a smaller range around the true values and have better estimation results for model (2.1).

Further, we generalize the model by introducing explanatory variables in the second level, which gives the following hierarchical model:

{yij=α0j+α1jxij+eij,α0j=β00+β01ω0j+u0j,α1j=β10+β11ω1j+u1j.

The assumptions about the distribution and independence of the random terms in the model are the same as in model (2.1). The two levels are combined to obtain the following form:

yij=β00+β01ω0j+β10xij+β11ω1jxij+(u0j+xiju1j+eij).

Given the true values of parameters (β00,β01,β10,β11,σu02,σu01,σu12,σe2), the data xij,ω0j,ω1j are generated independently from the standard normal distribution. Multiple simulations are performed and the estimated results are averaged to obtain Tab.2.

Tab.2 shows that the estimates of the parameters are close to the true values and the standard deviations are also small, which shows that the generalized least squares method is also applicable for the extended model iterations.

6 Example

In this section, we try to develop the hierarchical mixed-effects model to analyze the relationship between GDP and total foreign trade (TMX) for several major cities with export-oriented economies. Seven coastal cities with export-oriented economies, namely Tianjin, Shanghai, Dalian, Qingdao, Guangzhou, Shenzhen and Ningbo, are selected as the study subjects, and the GDP indicators and TMX indicators of the seven cities in recent years are recorded based on the data released by the National Bureau of Statistics. For Tianjin and Shanghai, we record the data for a total 25 years from 1990‒2014, and for the remaining five cities, we record the data for a total of 9 years from 2006‒2014. It should be noted that, the above data imbalance is caused by the fact that data of municipalities directly under the central government are more accessible and have longer coverage time than those of other cities. However, the model and estimation method involved in this paper do not have any requirement on whether the data are balanced or not, and can be used to analyze this data set. Using the economic indicators of different cities in different years as the first level, the first level model is developed as follows:

GDPij=α0j+α1jTMXij+eij,j=1,2,,7,i=1,2,,nj,

where nj=9 or 25. Since the data are centralized, the intercept term α0j represents the GDP value corresponding to the jth city when its TMX is at the total average level TMX¯ij, which should be related to the city's own average TMX¯j. Therefore, when building a linear model for the intercept term, we introduceTMX¯j as an explanatory variable to build the second level model:

{α0j=β00+β01TMX¯j+u0j,α1j=β10+u1j.

Combining (6.1) and (6.2), we get

GDPij=β00+β01TMX¯ij+β10TMXij+(u0j+TMXiju1j+eij).

The regression coefficients (β00,β01,β10) and variance-covariance components (σu02,σu01,σu12,σe2) in model (6.3) were estimated using the iterative generalized least squares method, and the results were shown in Tab.3.

From Tab.3, we can see that β01<0, which means that when TMXij=TMX¯ij, if the average level of foreign trade of the city is high in recent years, it will lower the GDP of the city in that year. This means the economy of a city depends excessively on foreign trade, which will have a negative impact on economic development. In addition, the average effect of TMX on GDP, i.e., the slope’s expectation β10=0.78, and the slope’s standard deviation σu1=0.46. So, the effect α1j of TMX on GDP is roughly in the interval (β102σu12,β10+2σu12)=(0.32,1.24). This indicates that the effect of TMX on GDP is always positive for different cities. However, the effect of TMX on GDP is higher in some cities and can reach about four times the low effect. Furthermore, the correlation coefficient ρ=σu01σu02σu12>0 between the city-level random terms u0j,u1j. Therefore, if a city adds the positive perturbation to the intercept term, then the effect of TMX on GDP tends to be larger.

7 Appendix

Conclusions related to the micro-quotient of matrices

tln|A(t)|=tr(A1(t)A(t)t),A(t)t=A1(t)A(t)tA1(t).

Conclusion related to matrix vectorization operations, trace operations

vec(ABC)=(CTA)vec(B),tr(AB)=(vec(AT))Tvec(B).

Vector permutation matrix

Definition: If for any given m×n dimensional matrix, there is alwaysvec(A)=Kvec(AT), then the mn×mn dimensional matrix K is said to be vector permutation matrix.

Properties: (1) K=K, (2) K2=I, (3) (I+K)2=2(I+K), (4) K(AB)=(BA)K.

From the above properties, the following conclusions can be drawn:

(VV)(I+K)=14(I+K)(V1V1).

The derivation of Eq. (3.9) Since

θ^=(XT(V1V1)X)1XT(V1V1)Y,

note thatM=(XT(V1V1)X)1, then

cov(θ^)=MXT(V1V1)cov(Y)(V1V1)TXMT=MXT(V1V1)(VV)(I+K)(V1V1)TXMT,

where (VV)(I+K)(V1V1)TX=X+(VV)K(V1V1)TX.

Here X=(vec(A1)|vec(A2)|vec(A3)|vec(A4)), A1,A2,A3,A4 correspond to design arrays of σu02, σu01, σu12, σe2, respectively, and all are symmetric arrays. Thus:

(VV)(I+K)(V1V1)TX=X+(VV)K(V1V1)T(vec(A1)|vec(A2)|vec(A3)|vec(A4))=X+(VV)K(vec(V1A1V1)|vec(V1A2V1)|vec(V1A3V1)|vec(V1A4V1))=X+(VV)(vec(V1A1V1)|vec(V1A2V1)|vec(V1A3V1)|vec(V1A4V1))=X+(VV)(V1V1)TX=2X.

Substituting (7.2) into (7.1), we get

cov(θ^)=2(XT(V1V1)X)1.

References

[1]

Bradley R A, Gart J J. The asymptotic properties of ML estimators when sampling from associate populations. Biometrika 1962; 49(1/2): 205–214

[2]

Goldstein H. Hierarchical mixed linear model analysis using iterative generalized least squares. Biometrika 1986; 73(1): 43–56

[3]

GoldsteinH. Hierarchical Statistical Models, 4th ed. Chichester: John Wiley & Sons, 2011

[4]

Henderson H V, Searle S R. The vec-permutation matrix the vec operator and Kronecker products: A review. Linear Multilinear Algebra 1981; 9(4): 271–288

[5]

Lele S, Taper M L. A composite likelihood approach to (co)variance components estimation. J Statist Plann Inference 2002; 103(1/2): 117–135

[6]

SearleS RCasellaGMcCullochC E. Variance Components. Hoboken, NJ: John Wiley & Sons, 2006

[7]

SnijdersT A BBoskerR J. Hierarchical Analysis: An Introduction to Basic and Advanced Hierarchical Modeling. London Sage, 1999

[8]

TianM Z. Higher hierarchical Quantile Regression Modeling Theory. Beijing: Science Press, 2015

[9]

WangS GShiJ HYinS JWuM X. Introduction to linear models. Beijing: Science Press, 2004

[10]

Wang S G, Yin S J. A new estimate of the parameters in linear mixed models. Sci Sin Math 2002; 32(5): 434–443

RIGHTS & PERMISSIONS

Higher Education Press 2023

AI Summary AI Mindmap
PDF (631KB)

641

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/