A penalized integrative deep neural network for variable selection among multiple omics datasets

Yang Li , Xiaonan Ren , Haochen Yu , Tao Sun , Shuangge Ma

Quant. Biol. ›› 2024, Vol. 12 ›› Issue (3) : 313 -323.

PDF (5652KB)
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (3) : 313 -323. DOI: 10.1002/qub2.51
METHOD

A penalized integrative deep neural network for variable selection among multiple omics datasets

Author information +
History +
PDF (5652KB)

Abstract

Deep learning has been increasingly popular in omics data analysis. Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability. However, because deep learning desires a large sample size, the existing methods may result in uncertain findings when the dataset has a small sample size, commonly seen in omics data analysis. With the explosion and availability of omics data from multiple populations/studies, the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets, which might lead to inaccurate variable selection results. We propose a penalized integrative deep neural network (PIN) to simultaneously select important variables from multiple datasets. PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework. Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets. The source code is freely available on Github (rucliyang/PINFunc). We speculate that the proposed PIN method will promote the identification of disease‐related important variables based on multiple studies/datasets from diverse origins.

Keywords

deep learning / integrative analysis / multiple omics datasets / variable selection

Cite this article

Download citation ▾
Yang Li, Xiaonan Ren, Haochen Yu, Tao Sun, Shuangge Ma. A penalized integrative deep neural network for variable selection among multiple omics datasets. Quant. Biol., 2024, 12(3): 313-323 DOI:10.1002/qub2.51

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Chen H , Huffman JE , Brody JA , Wang C , Lee S , Li Z , et al. Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. Am J Hum Genet. 2019; 104 (2): 260- 74.

[2]

Collins FS , Varmus H . A new initiative on precision medicine. N Engl J Med. 2015; 372 (9): 793- 5.

[3]

Min S , Lee B , Yoon S . Deep learning in bioinformatics. Briefings Bioinf. 2017; 18 (5): 851- 69.

[4]

Miotto R , Wang F , Wang S , Jiang X , Dudley JT . Deep learning for healthcare: review, opportunities and challenges. Briefings Bioinf. 2018; 19 (6): 1236- 46.

[5]

Hanczar B , Zehraoui F , Issa T , Arles M . Biological interpretation of deep neural network for phenotype prediction based on gene expression. BMC Bioinf. 2020; 21 (1): 1- 18.

[6]

Lemhadri I , Ruan F , Tibshirain R . LassoNet: neural net works with feature sparsity. J Mach Learn Res. 2021; 22 (127): 1- 29.

[7]

Feng J , Simon N . Sparse‐input neural networks for high‐dimensional nonparametric regression and classification. 2017. Preprint at arXiv:1711.07592.

[8]

Li Y , Chen CY , Wasserman WW . Deep feature selection: theory and application to identify enhancers and promoters. J Comput Biol. 2016; 23 (5): 322- 36.

[9]

Zhang H , Wang J , Sun Z , Zurada JM , Pal NR . Feature selection for neural networks using group lasso regularization. IEEE Trans Knowl Data Eng. 2019; 32 (4): 659- 73.

[10]

Begoli E , Bhattacharya T , Kusnezov D . The need for uncertainty quantification in machine-assisted medical decision making. Nat Mach Intell. 2019; 1 (1): 20- 3.

[11]

Gawlikowski J , Tassi CR , Ali M , Lee J , Humt M , Feng J , et al. A survey of uncertainty in deep neural networks. 2021. Preprint at arXiv: 2107.03342.

[12]

Lu L , Dercle L , Zhao B , Schwartz LH . Deep learning for the prediction of early on-treatment response in metastatic colorectal cancer from serial medical imaging. Nat Commun. 2021; 12 (1): 1- 11.

[13]

Ma S , Huang J , Wei F , Xie Y , Fang K . Integrative analysis of multiple cancer prognosis studies with gene expression measurements. Stat Med. 2011; 30 (28): 3361- 71.

[14]

Zhang Q , Zhang S , Liu J , Huang J , Ma S . Penalized integrative analysis under the accelerated failure time model. Stat Sin. 2016: 493- 508.

[15]

Ma S , Huang J , Song X . Integrative analysis and variable selection with multiple high-dimensional datasets. Biostatistics. 2011; 12 (4): 763- 75.

[16]

Liu J , Huang J , Ma S . Integrative analysis of multiple cancer genomic datasets under the heterogeneity model. Stat Med. 2013; 32 (20): 3509- 21.

[17]

Liu J , Huang J , Xie Y , Ma S . Sparse group penalized integrative analysis of multiple cancer prognosis datasets. Genetics Research. 2013; 95 (2‐3): 68- 77.

[18]

Wang S , Wu M , Ma S . Integrative analysis of cancer omics data for prognosis modeling. Genes. 2019; 10 (8): 604.

[19]

Fan X , Fang K , Ma S , Zhang Q . Integrating approximate single factor graphical models. Stat Med. 2020; 39 (2): 146- 55.

[20]

Deary IJ , Corley J , Gow AJ , Harris SE , Houlihan LM , Marioni RE , et al. Age-associated cognitive decline. Br Med Bull. 2009; 92 (1): 135- 52.

[21]

Huang J , Ma S . Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Anal. 2010; 16.2 (2): 176- 95.

[22]

Vasanthakumar A , Davis JW , Idler K , Waring JF , Asque E , Riley-Gillis B , et al. Harnessing peripheral DNA methylation differences in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to reveal novel biomarkers of disease. Clin Epigenet. 2020; 12 (1): 1- 11.

[23]

Mohammadnejad A , Li W , Lund JB , Li S , Larsen MJ , Mengel‐From J , et al. Global gene expression profiling and transcription factor network analysis of cognitive aging in monozygotic twins. Front Genet. 2021; 12.

[24]

Jin J , Zhao X , Fu H , Gao Y . The effects of YAP and its related mechanisms in central nervous system diseases. Front Neurosci. 2020; 14: 595.

[25]

Jin WN , Shi K , He W , Sun JH , Van Kaer L , Shi FD , et al. Neuroblast senescence in the aged brain augments natural killer cell cytotoxicity leading to impaired neurogenesis and cognition. Nat Neurosci. 2021; 24 (1): 61- 73.

[26]

Zucca FA , Vanna R , Cupaioli FA , Bellei C , De Palma A , Di Silvestre D , et al. Neuromelanin organelles are specialized autolysosomes that accumulate undegraded proteins and lipids in aging human brain and are likely involved in Parkinson’s disease. NPJ Parkinson’s Dis. 2018; 4 (1): 1- 23.

[27]

Ganzfried BF , Riester M , Haibe‐Kains B , Risch T , Tyekucheva S , Jazic I , et al. CuratedOvarianData: clinically annotated data for the ovarian cancer transcriptome. Database. 2013; 2013.

[28]

Li NA , Zhan X , Zhan X . The lncRNA SNHG3 regulates energy metabolism of ovarian cancer by an analysis of mitochondrial proteomes. Gynecol Oncol. 2018; 150 (2): 343- 54.

[29]

Yonashiro R , Eguchi K , Wake M , Takeda N , Nakayama K . Pyruvate dehydrogenase PDH-E1b controls tumor progression by altering the metabolic status of cancer cells. Cancer Res. 2018; 78 (7): 1592- 603.

[30]

Vasco DBB , Pereira SA , Serpa J , Vicente JB . Cysteine metabolic circuitries: druggable targets in cancer. Br J Cancer. 2021; 124 (5): 862- 79.

[31]

Enroth S , Berggrund M , Lycke M , Broberg J , Lundberg M , Assarsson E , et al. High throughput proteomics identifies a high-accuracy 11 plasma protein biomarker signature for ovarian cancer. Commun Biol. 2019; 2 (1): 1- 12.

[32]

Zhu LY , Zhang WM , Yang XM , Cui L , Li J , Zhang YL , et al. Silencing of MICAL-L2 suppresses malignancy of ovarian cancer by inducing mesenchymal- epithelial transition. Cancer Lett. 2015; 363 (1): 71- 82.

[33]

Yoshihara K , Tsunoda T , Shigemizu D , Fujiwara H , Hatae M , Fujiwara H , et al. High-risk ovarian cancer based on 126-gene expression signature is uniquely characterized by downregulation of antigen presentation pathway. Clin Cancer Res. 2012; 18 (5): 1374- 85.

[34]

Katzman JL , Shaham U , Cloninger A , Bates J , Jiang T , Kluger Y . DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol. 2018; 18 (1): 1- 12.

[35]

Sun T , Wei Y , Chen W , Ding Y . Genome‐wide association study‐based deep learning for survival prediction. Stat Med. 2020; 39.30 (30): 4605- 20.

[36]

Gao F , Wang W , Tan M , Zhu L , Zhang Y , Fessler E , et al. DeepCC: a novel deep learning-based framework for cancer molecular subtype classification. Oncogenesis. 2019; 8 (9): 1- 12.

[37]

Wilson RS , Segawa E , Boyle PA , Anagnos SE , Hizel LP , Bennett DA . The natural history of cognitive decline in Alzheimer’s disease. Psychol Aging. 2012; 27 (4): 1008- 17.

[38]

Yuan M , Lin Y . Model selection and estimation in regression with grouped variables. J Roy Stat Soc B. 2006; 68 (1): 49- 67.

[39]

Simon N , Friedman J , Hastie T , Tibshirani R . A sparse-group lasso. J Comput Graph Stat. 2013; 22 (2): 231- 45.

[40]

Qian N . On the momentum term in gradient descent learning algorithms. Neural Net. 1999; 12 (1): 145- 51.

RIGHTS & PERMISSIONS

2024 The Author(s). Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.

AI Summary AI Mindmap
PDF (5652KB)

343

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/