1. Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing 100872, China
2. School of Statistics, Lanzhou University of Finance and Economics, Lanzhou 730020, China
3. Xinjiang Social & Economic Statistics Research Center, School of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi 830012, China
mztian@ruc.edu.cn
Show less
History+
Received
Accepted
Published
2023-10-15
Issue Date
Revised Date
2023-12-27
PDF
(631KB)
Abstract
In many fields, we need to deal with hierarchically structured data. For this kind of data, hierarchical mixed effects model can show the correlation of variables in the same level by establishing a model for regression coefficients. Due to the complexity of the random part in this model, seeking an effective method to estimate the covariance matrix is an appealing issue. Iterative generalized least squares estimation method was proposed by Goldstein in 1986 and was applied in special case of hierarchical model. In this paper, we extend the method to the general hierarchical mixed effects model, derive its expressions in detail and apply it to economic examples.
Chunyu WANG, Maozai TIAN.
The large sample property of the iterative generalized least squares estimation for hierarchical mixed effects model.
Front. Math. China, 2023, 18(5): 327-339 DOI:10.3868/s140-DDD-023-0023-x
For the hierarchically structured data, an important feature is the correlation between the data within the same level. For example, if we examine the math scores of students from different classes, it is clear that there should be some correlation between the scores of students in the same class, and this intra-level correlation is something that we must consider in the subsequent analysis. At this point the mutual independence of the random terms assumed by the traditional regression model no longer holds. The hierarchical model reflects the intra-level correlation of the data itself by introducing the same random term within the same level [7, 8]. As an example, take the simplest stratified model (null model):
where are the subscripts of the first and second level, respectively, and the random terms are mutually independent and satisfy , , . In this model, different individuals in the same level have the same random term , and the correlation coefficient between them is
It can be seen that the weight of in the total variance reflects the intra-level correlation. Thus, estimating the covariance array for the hierarchical model is an inescapable problem. The well-known and commonly used estimation methods include maximum likelihood estimation (MLE), restricted maximum likelihood estimation (RMLE), and empirical Bayesian estimation (EB). These methods have some shortcomings: MLE and RMLE involve the differentiation of vectors, which can be difficult to handle when the random part of the model becomes complex; EB has different effects on the results when estimating the prior distribution using empirical data depending on the selection of the estimator [3]. In recent years, scholars have made new explorations and attempts around the estimation method of covariance array. Lele and Taper [5] (2002) proposed the method of compound likelihood (MCL), which reduces the complexity of the algorithm; Wang and Yin [10] (2002) proposed the spectral decomposition method for the hierarchical model of a class of balanced data to obtain an explicit solution for the variance components.
The iterative generalized least squares estimation used in this paper was originally proposed by Goldstein [2] (1986) for the variance component model, i.e.,
In this model, the random part is independent with respect to the explanatory variable . Here, we try to derive a specific expression for parameter estimation based on iterative generalized least squares for a more general hierarchical model containing random effects (i.e., random terms shaped as ). Then we will perform the proof of the relevant properties, and show the effect of estimation through simulations.
2 Model
Consider a general two-level mixed-effects model:
where is the subscript of the first level, , is the subscript of the second level, . Specifically, we can consider the second level as m schools, the first level as the students in each school. The school has students. In the model, is the response variable, is the explanatory variable, and are the random terms that satisfies:
where and are independent. Thus, we can obtain the merged form:
In this way, is decomposed into the sum of a fixed part and a random part . Thereby, given , the variance of is
The covariance between different individuals in the same school is
Thus, for the given school, the covariance array is
where
In model (2.1), we are interested in the regression coefficients and the covariance array . Notice that in (2.3), each piece of the covariance array can be represented by the variance-covariance component . Therefore, the actual needed parameter in the model is . In the following, we give iterative generalized least squares estimates of the parameters.
3 Iterative generalized least squares estimation
Rewrite (2.2) in the form of matrix, we have
where is the design array, columns are full rank, , and is the error vector. Given covariance array and is non-singular, the generalized least squares estimate of is as follow:
The estimated covariance is
On the other hand, since , we can build the regression model for the covariance matrix when is known:
where the covariance array , is the error array. Any piece of can be expressed as a linear form of the unknown parameter :
where is an dimensional matrix with all elements equal to 1 and is an dimensional unit array,
Substituting (3.4) into (3.3) and doing matrix operator, we get
The above equation can be organized as the following form:
where is the design matrix, columns be full rank, and is the error vector. In this way, we transform the problem of estimating the covariance array into generalized least squares estimation of estimating the regression coefficient in the linear regression model (3.5). Thus, the generalized least squares estimate of is
where is the covariance array of . It can be shown that in the normal case (its proof sees the work of Searle et al. [6]). There is
where denotes the Kronecker product, is the vector permutation matrix. Further, substituting (3.7) into (3.6), using the Kronecker product and the relevant properties of vector permutation matrix [4, 9], the estimate of in the normal case can be obtained:
Estimated covariance array (see the appendix of Section 7 for derivation details):
However, in practice, both and are unknown. Therefore, in (3.2), we replace the true with the estimate of the current covariance array to find ; similarly, in (3.8), we replace the true , with the current estimates , to find , respectively, and then obtain . This iterative update until convergence, which is the idea of iterative generalized least squares. In summary, the steps of iterative generalized least squares estimation can be summarized as follows.
Given the initial value , can be obtained according to equation (3.4), and given the convergence threshold , the iteration as follows.
Step 1: ;
Step 2: , where ;
Step 3: The iteration ends when and hold simultaneously; otherwise, go to the step 1; where is the infinite number, and for any given P-dimensional vector , .
It can be seen that the iterative generalized least squares estimation achieves the estimation of regression coefficients as well as the covariance array, which is achieved by the iterative approach based on the generalized least squares estimation. Here, the essential idea about the estimation of the covariance array is to use the sample residual array to estimate the covariance array . For the hierarchical mixed effects model in this paper, the covariance array can be fully represented by the variance-covariance components on intra- and inter-level. Therefore, estimating the covariance array is equivalent to estimating the regression coefficients in the linear regression model (3.5).
4 Properties of iterative generalized least squares estimation
Proposition 1For model (2.1) and its matrix form (3.1), (3.5), if Y obeys the multivariate normal distribution and the design array , are column full rank. Then for parameters , the iterative generalized least squares estimation is equivalent to the maximum likelihood estimation.
Proof For the n-element normal distribution , the log-likelihood function is
For the regression coefficient , the estimated equation is
Equation (3.2) is obtained by rectifying. It can be seen that the maximum likelihood estimation for is equivalent to the iterative generalized least squares estimation.
The estimation of the variance component is considered below. The log-likelihood function (4.1) is rewritten as the following equation:
where . Since is the parameter in the covariance array , the estimated equation for based on the maximum likelihood estimation is
In the iterative generalized least squares estimation, is obtained from equation (3.6), which can be obtained by minimizing
When holds, minimizing equation (4.5) is equivalent to minimizing
It should be noted that we use different symbols and to characterize whether the covariance array is related to the estimated parameter . The former is the expression before the straightening operation on , which is related to , and the latter is the true covariance array in the model, which is treated as a constant in the derivation process below. Let equation (4.6) be . From the relevant properties of matrix vectorization operation and trace operation, we get
Taking the partial derivative of the above equation with respect to , we get
Setting the partial derivative of the above equation equal to 0, the estimated equation for is obtained:
This is the same as (4.4). Therefore, for , the maximum likelihood estimate is equivalent to the iterative generalized least squares estimate.
From the above conclusion, it can be deduced that the property of maximum likelihood estimation is appropriate for iterative generalized least squares estimation in the normal case. We know that for the variance, its maximum likelihood estimate is biased, for example, when , the maximum likelihood estimates for is and . Therefore, the estimate of the variance obtained using iterative generalized least squares is also biased.
Regarding the large-sample property, we still consider firstly the large-sample property of maximum likelihood estimation, and then naturally transition to iterative generalized least squares estimation in the above equivalence. For the hierarchical model, since the second level samples are independent of each other, the likelihood functions are as follows:
where , is the distribution density function. It should be noted that the samples are independent but not equally distributed, depending on a common parameter vector . For the maximum likelihood estimate in this case, Bradley and Gart [1] proves that is consistent and asymptotically normal if the following assumptions hold.
Hypothesis I (i) For any , there exist , ; .
Hypothesis I (ii) For any density function , and any , there are
where are integrable in , and
Hypothesis II (i) For the density function column , we have
where .
Hypothesis II (ii) For the density function column , we have
where ; and exists, .
Hypothesis II (iii) For , we have
where ; and , is a positive constant.
Hypothesis III
where .
Theorem 1Letobey a multivariate normal distribution, be an iterative generalized least squares estimate of the parameter vector , andbe the true value. Then
(1) Under the conditions that hypothesis I, II hold, has consistence: ;
(2) Under the conditions that hypothesis I, II, and III, is asymptotically normal: , where .
Proof The proof of the theorem can be obtained directly from the large-sample property of given by Bradley and Gart [1] (1962) under the guarantee of the above equivalence proposition.
5 Simulation
To test the estimation effect of the iterative generalized least squares method, we first simulate model (2.1). Given the regression coefficients , variance-covariance components , define the same as in equation (2.3). is generated from the standardized normal distribution. The random term of the second level is generated from two-dimensional normal distribution , i.e., , . The random term of the first level is generated from one-dimensional normal distribution , i.e., , . Taking , is generated from the Poisson distribution with average of 25. Thus, is obtained according to . For illustration purposes, as above, we consider the first and second levels of the model here as the student level and the school level, respectively, so that is the response variable corresponding to the student in school. More intuitively, we plot the scatter plots for two of the schools as follows.
As can be seen in Fig.1, the intercept and regression coefficients are not the same for each school because different schools correspond to different school-level random terms, and are perturbed based on . We estimated the regression coefficients and variance components using iterative generalized least squares, and the estimation results are shown in Tab.1.
It can be seen that the estimated values of the parameters deviate less from the true values, and the standard deviations of the estimates are also small, which indicates that the parameter estimates obtained by using the iterative generalized least squares method fluctuate in a smaller range around the true values and have better estimation results for model (2.1).
Further, we generalize the model by introducing explanatory variables in the second level, which gives the following hierarchical model:
The assumptions about the distribution and independence of the random terms in the model are the same as in model (2.1). The two levels are combined to obtain the following form:
Given the true values of parameters , the data are generated independently from the standard normal distribution. Multiple simulations are performed and the estimated results are averaged to obtain Tab.2.
Tab.2 shows that the estimates of the parameters are close to the true values and the standard deviations are also small, which shows that the generalized least squares method is also applicable for the extended model iterations.
6 Example
In this section, we try to develop the hierarchical mixed-effects model to analyze the relationship between GDP and total foreign trade (TMX) for several major cities with export-oriented economies. Seven coastal cities with export-oriented economies, namely Tianjin, Shanghai, Dalian, Qingdao, Guangzhou, Shenzhen and Ningbo, are selected as the study subjects, and the GDP indicators and TMX indicators of the seven cities in recent years are recorded based on the data released by the National Bureau of Statistics. For Tianjin and Shanghai, we record the data for a total 25 years from 1990‒2014, and for the remaining five cities, we record the data for a total of 9 years from 2006‒2014. It should be noted that, the above data imbalance is caused by the fact that data of municipalities directly under the central government are more accessible and have longer coverage time than those of other cities. However, the model and estimation method involved in this paper do not have any requirement on whether the data are balanced or not, and can be used to analyze this data set. Using the economic indicators of different cities in different years as the first level, the first level model is developed as follows:
where or 25. Since the data are centralized, the intercept term represents the GDP value corresponding to the city when its TMX is at the total average level , which should be related to the city's own average . Therefore, when building a linear model for the intercept term, we introduce as an explanatory variable to build the second level model:
Combining (6.1) and (6.2), we get
The regression coefficients and variance-covariance components in model (6.3) were estimated using the iterative generalized least squares method, and the results were shown in Tab.3.
From Tab.3, we can see that , which means that when , if the average level of foreign trade of the city is high in recent years, it will lower the GDP of the city in that year. This means the economy of a city depends excessively on foreign trade, which will have a negative impact on economic development. In addition, the average effect of TMX on GDP, i.e., the slope’s expectation , and the slope’s standard deviation . So, the effect of TMX on GDP is roughly in the interval . This indicates that the effect of TMX on GDP is always positive for different cities. However, the effect of TMX on GDP is higher in some cities and can reach about four times the low effect. Furthermore, the correlation coefficient between the city-level random terms . Therefore, if a city adds the positive perturbation to the intercept term, then the effect of TMX on GDP tends to be larger.
7 Appendix
Conclusions related to the micro-quotient of matrices
Conclusion related to matrix vectorization operations, trace operations
Vector permutation matrix
Definition: If for any given dimensional matrix, there is always, then the dimensional matrix is said to be vector permutation matrix.
Properties: (1) , (2) , (3) , (4) .
From the above properties, the following conclusions can be drawn:
The derivation of Eq. (3.9) Since
note that, then
where .
Here , correspond to design arrays of , , , , respectively, and all are symmetric arrays. Thus:
WangS GShiJ HYinS JWuM X. Introduction to linear models. Beijing: Science Press, 2004
[10]
Wang S G, Yin S J. A new estimate of the parameters in linear mixed models. Sci Sin Math2002; 32(5): 434–443
RIGHTS & PERMISSIONS
Higher Education Press 2023
AI Summary 中Eng×
Note: Please be aware that the following content is generated by artificial intelligence. This website is not responsible for any consequences arising from the use of this content.