Information-Based Optimal Subdata Selection for Large Sample Spatial Autoregression
Yunquan Song , Sijia Shen , Yaqi Liu
Communications in Mathematics and Statistics ›› : 1 -23.
Information-Based Optimal Subdata Selection for Large Sample Spatial Autoregression
Extraordinary amounts of data are being produced in almost every branch of science. Proven statistical methods are no longer applicable with super large data sets due to computational limitations. To address this issue, subdata selection is considered to be an effective strategy. In this study, we propose a novel framework of selecting subsets of data for spatial autoregression. We show that, while the information contained in the subdata based on random sampling approaches is limited by the size of the subset, the information contained in the subdata based on the new framework increases as the size of the full data set increases. We propose a novel approach, termed information-based optimal subdata selection. Performances of the proposed approach and that of random sampling method are compared under various criteria via extensive simulation studies. Theoretical results and extensive simulation demonstrate that IBOSS approach performs better than random subsampling method. The advantages of the new approach are also illustrated through analysis of real data.
Massive data / Information matrix / D-optimality criterion / Subdata / 62H10 / 62H12
School of Mathematical Sciences, University of Science and Technology of China and Springer-Verlag GmbH Germany, part of Springer Nature
/
| 〈 |
|
〉 |