Flexible Factor Model for Handling Missing Data in Supervised Learning

Andriette Bekker , Farzane Hashemi , Mohammad Arashi

Communications in Mathematics and Statistics ›› 2023, Vol. 11 ›› Issue (2) : 477 -501.

PDF
Communications in Mathematics and Statistics ›› 2023, Vol. 11 ›› Issue (2) : 477 -501. DOI: 10.1007/s40304-021-00260-9
Article

Flexible Factor Model for Handling Missing Data in Supervised Learning

Author information +
History +
PDF

Abstract

This paper presents an extension of the factor analysis model based on the normal mean–variance mixture of the Birnbaum–Saunders in the presence of nonresponses and missing data. This model can be used as a powerful tool to model non-normal features observed from data such as strongly skewed and heavy-tailed noises. Missing data may occur due to operator error or incomplete data capturing therefore cannot be ignored in factor analysis modeling. We implement an EM-type algorithm for maximum likelihood estimation and propose single imputation of possible missing values under a missing at random mechanism. The potential and applicability of our proposed method are illustrated through analyzing both simulated and real datasets.

Keywords

Automobile dataset / Asymmetry / ECME algorithm / Factor analysis model / Heavy tails / Incomplete data / Liver disorders dataset

Cite this article

Download citation ▾
Andriette Bekker, Farzane Hashemi, Mohammad Arashi. Flexible Factor Model for Handling Missing Data in Supervised Learning. Communications in Mathematics and Statistics, 2023, 11(2): 477-501 DOI:10.1007/s40304-021-00260-9

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Anderson, T.W.: An introduction to multivariate statistical analysis (Wiley Series in Probability and Statistics), 3 edn. (2003)

[2]

Barndorff-Nielsen O, Halgreen C. Infinite divisibility of the hyperbolic and generalized inverse gaussian distributions. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete. 1977, 38 4 309-311

[3]

Basilevsky, A.T.: Statistical factor analysis and related methods: theory and applications, New York, Wiley (2009)

[4]

Desmond AF. On the relationship between two fatigue-life models. IEEE Trans. Reliab.. 1986, 35 2 167-169

[5]

Fokoué, E., Titterington, D.: Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation. Machine Learning 50(1), 73–94 (2003)

[6]

Good IJ. The population frequencies of species and the estimation of population parameters. Biometrika. 1953, 40 3–4 237-264

[7]

Hashemi F, Naderi M, Jamalizadeh A, Lin TI. A skew factor analysis model based on the normal mean–variance mixture of Birnbaum-Saunders distribution. J. Appl. Stat.. 2020, 47 16 3007-3029

[8]

Hashemi F, Naderi M, Mashinchi M. Clustering right-skewed data stream via Birnbaum-Saunders mixture models: a flexible approach based on fuzzy clustering algorithm. Appl. Soft Comput.. 2019, 82 105539

[9]

Kibler D, Aha DW, Albert MK. Instance-based prediction of real-valued attributes. Comput. Intell.. 1989, 5 2 51-57

[10]

Lawley DN. The estimation of factor loadings by the method of maximum likelihood. Proc. R. Soc. Edinb.. 1940, 60 1 64-82

[11]

Lawley DN, Maxwell AE. Factor analysis as a statistical method. J. Royal Statist. Soc.: Series D (The Statistician). 1962, 12 3 209-229

[12]

Lee SX, Mclachlan GJ. On mixtures of skew normal and skew t-distributions. Adv. Data Anal. Classif.. 2013, 7 3 241-266

[13]

Lin TI, Ho HJ, Lee CR. Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat. Comput.. 2014, 24 4 531-546

[14]

Lin TI, Wang WL, McLachlan GJ, Lee SX. Robust mixtures of factor analysis models using the restricted multivariate skew-t distribution. Stat. Model.. 2018, 18 1 50-72

[15]

Little R, Rubin D. Statistical analysis with missing data. 2002 London: Wiley

[16]

Liu C, Rubin DB. The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika. 1994, 81 4 633-648

[17]

Liu M, Lin T. Skew-normal factor analysis models with incomplete data. J. Appl. Stat.. 2015, 42 4 789-805

[18]

McLachlan GJ, Bean R, Jones LBT. Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Statist. Data Analy.. 2007, 51 11 5327-5338

[19]

Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika. 1993, 80 2 267-278

[20]

Murray PM, Browne RP, McNicholas PD. Mixtures of skew-t factor analyzers. Comput. Statist. Data Analy.. 2014, 77 326-335

[21]

Murray PM, McNicholas PD, Browne RP. A mixture of common skew-t factor analysers. Stat. 2014, 3 1 68-82

[22]

Pourmousa R, Jamalizadeh A, Rezapour M. Multivariate normal mean-variance mixture distribution based on Birnbaum-Saunders distribution. J. Stat. Comput. Simul.. 2015, 85 13 2736-2749

[23]

Rubin DB. Inference and missing data. Biometrika. 1976, 63 3 581-592

[24]

Rubin DB, Thayer DT. Em algorithms for ml factor analysis. Psychometrika. 1982, 47 1 69-76

[25]

Schafer, J.L.: Analysis of incomplete multivariate data. CRC Press (1997)

[26]

Tortora C, McNicholas PD, Browne RP. A mixture of generalized hyperbolic factor analyzers. Adv. Data Anal. Classif.. 2015, 10 4 423-440

[27]

Villasenor Alva, J.A., Estrada, E.G.: A generalization of shapiro-wilk’s test for multivariate normality. Communications in Statistics-Theory and Methods 38(11), 1870–1883 (2009)

[28]

Wang WL, Liu M, Lin TI. Robust skew-t factor analysis models for handling missing data. Statis. Methods Appl.. 2017, 26 4 649-672

[29]

Wei Y, Tang Y, McNicholas PD. Flexible high-dimensional unsupervised learning with missing data. IEEE Trans. Pattern Anal. Mach. Intell.. 2020, 42 3 610-621

Funding

National Research Foundation, South Africa(120839)

National Research Foundation, South Africa(71199)

Ferdowsi University of Mashhad(2/54034)

AI Summary AI Mindmap
PDF

128

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/