Variable importance-weighted Random Forests

Yiyi Liu; Hongyu Zhao

doi:10.1007/s40484-017-0121-6

PDF(3398 KB)

Quant. Biol. ›› 2017, Vol. 5 ›› Issue (4) : 338-351. DOI: 10.1007/s40484-017-0121-6

RESEARCH ARTICLE

Variable importance-weighted Random Forests

Yiyi Liu¹ ,
Hongyu Zhao¹^,²

Author information +

History +

Abstract

Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest.

Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features.

Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases.

Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package “viRandomForests” based on the original R package “randomForest” and it can be freely downloaded from http://zhaocenter.org/software.

Graphical abstract

Keywords

Random Forests / variable importance score / classification / regression

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Yiyi Liu, Hongyu Zhao. Variable importance-weighted Random Forests. Quant. Biol., 2017, 5(4): 338‒351 https://doi.org/10.1007/s40484-017-0121-6

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Hanahan, D. and Weinberg, R. A. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646–674 CrossRef Pubmed Google scholar

[2]	Breiman, L. (2001) Random forests. Mach. Learn., 45, 5– 32 CrossRef Google scholar

[3]	Palmer, D. S., O’Boyle, N. M., Glen, R. C. and Mitchell, J. B. (2007) Random forest models to predict aqueous solubility. J. Chem. Inf. Model., 47, 150–158 CrossRef Pubmed Google scholar

[4]	Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., Lu, Z. (2007) MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res. 35, W339–W344

[5]	Lee, J. W., Lee, J. B., Park, M., Song, S. H.(2005) An extensive comparison of recent classification tools applied to microarray data. Comput. Stat. Data Anal. 48, 869–885

[6]	Goldstein, B. A., Polley, E. C. and Briggs, F. B. (2011) Random forests for genetic association studies. Stat. Appl. Genet. Mol. Biol., 10, 32 CrossRef Pubmed Google scholar

[7]	Amaratunga, D., Cabrera, J. and Lee, Y. S. (2008) Enriched random forests. Bioinformatics, 24, 2010–2014 CrossRef Pubmed Google scholar

[8]	Granitto, P. M., Furlanello, C., Biasioli, F. and Gasperi, F. (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometr. Intell. Lab., 83, 83–90 CrossRef Google scholar

[9]	Svetnik, V., Liaw, A., Tong, C. and Wang, T. (2004) Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. Lect. Notes Comput. Sci., 3077, 334–343 CrossRef Google scholar

[10]	Díaz-Uriarte, R. and de Andrés, S.A. (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7, 3 CrossRef Pubmed Google scholar

[11]	Breiman, L. (2001) Statistical modeling: the two cultures. Stat. Sci., 16, 199–231 CrossRef Google scholar

[12]	Amaratunga, D. and Cabrera, J. (2009) A conditional t suite of tests for identifying differentially expressed genes in a DNA microarray experiment with little replication. Stat. Biopharm. Res., 1, 26–38 CrossRef Google scholar

[13]	Biau, G. (2012) Analysis of a random forests model. J. Mach. Learn. Res., 13, 1063–1095

[14]	Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A. A., Kim, S., Wilson, C. J., Lehár, J., Kryukov, G. V., Sonkin, D., (2012) The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483, 603–607 CrossRef Pubmed Google scholar

[15]	Guyon, I., Gunn, S., Ben-Hur, A. and Dror, G. (2004) Result Analysis of The Nips 2003 Feature Selection Challenge. In Proceeding NIPS’04 Proceedings of the 17th International Conference on Neural Information Processing Systems. pp. 545–552

[16]	Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Kim, J. Y., Goumnerova, L. C., Black, P. M., Lau, C., (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415, 436–442 CrossRef Pubmed Google scholar

[17]	Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo P., Renshaw, A. A., D’Amico, A. V., Richie, J. P., (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1, 203–209 CrossRef Pubmed Google scholar

ABBREVIATIONS

RF, Random Forests

viRF, variable importance-weighted Random Forests

eRF, enriched Random Forests

feRF, feature elimiation Random Forests

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at DOI 10.1007/s40484-017-0121-6.

ACKNOWLEDGEMENTS

This study was supported in part by the National Institutes of Health grants R01 GM59507, P01 CA154295, and P50 CA196530.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Yiyi Liu and Hongyu Zhao declare they have no conflict of interests.

This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2017 Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap

PDF(3398 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Revised	Accepted	Published
20 Mar 2017	12 May 2017	29 May 2017	04 Dec 2017
Online First Date	Issue Date
06 Nov 2017	04 Dec 2017

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs