![](/develop/static/imgs/pdf.png)
A feature extraction framework for discovering pan-cancer driver genes based on multi-omics data
Xiaomeng Xue, Feng Li, Junliang Shang, Lingyun Dai, Daohui Ge, Qianqian Ren
A feature extraction framework for discovering pan-cancer driver genes based on multi-omics data
The identification of tumor driver genes facilitates accurate cancer diagnosis and treatment, playing a key role in precision oncology, along with gene signaling, regulation, and their interaction with protein complexes. To tackle the challenge of distinguishing driver genes from a large number of genomic data, we construct a feature extraction framework for discovering pan-cancer driver genes based on multi-omics data (mutations, gene expression, copy number variants, and DNA methylation) combined with protein–protein interaction (PPI) networks. Using a network propagation algorithm, we mine functional information among nodes in the PPI network, focusing on genes with weak node information to represent specific cancer information. From these functional features, we extract distribution features of pan-cancer data, pan-cancer TOPSIS features of functional features using the ideal solution method, and SetExpan features of pan-cancer data from the gene functional features, a method to rank pan-cancer data based on the average inverse rank. These features represent the common message of pan-cancer. Finally, we use the lightGBM classification algorithm for gene prediction. Experimental results show that our method outperforms existing methods in terms of the area under the check precision-recall curve (AUPRC) and demonstrates better performance across different PPI networks. This indicates our framework’s effectiveness in predicting potential cancer genes, offering valuable insights for the diagnosis and treatment of tumors.
cancer driver genes / feature extraction / multi-omics data / network propagation / pan-cancer
[1] |
Bray F , Ren J-S , Masuyer E , Ferlay J . Global estimates of cancer prevalence for 27 sites in the adult population in 2008. Int J Cancer. 2013; 132 (5): 1133- 45.
CrossRef
Google scholar
|
[2] |
Hanahan D , Weinberg RA . Hallmarks of cancer: the next generation. Cell. 2011; 144 (5): 646- 74.
CrossRef
Google scholar
|
[3] |
Dinstag G , Shamir R . Prodigy: personalized prioritization of driver genes. Bioinformatics. 2020; 36 (6): 1831- 9.
CrossRef
Google scholar
|
[4] |
Garraway LA , Lander ES . Lessons from the cancer genome. Cell. 2013; 153 (1): 17- 37.
CrossRef
Google scholar
|
[5] |
Ledford H . The cancer genome challenge. Nature. 2010; 464 (7291): 972- 4.
CrossRef
Google scholar
|
[6] |
Weinstein JN , Collisson EA , Mills GB , Shaw KRM , Ozenberger BA , Ellrott K , et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013; 45 (10): 1113- 20.
CrossRef
Google scholar
|
[7] |
Zhang J , Bajari R , Andric D , Gerthoffert F , Lepsa A , Nahal-Bose H , et al. The international cancer genome consortium data portal. Nat Biotechnol. 2019; 37 (4): 367- 9.
CrossRef
Google scholar
|
[8] |
Repana D , Nulsen J , Dressler L , Bortolomeazzi M , Venkata SK , Tourna A , et al. The network of cancer genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol. 2019; 20 (1): 1.
CrossRef
Google scholar
|
[9] |
Sondka Z , Bamford S , Cole CG , Ward SA , Dunham I , Forbes SA . The cosmic cancer gene census: describing genetic dysfunction across all human cancers. Nat Rev Cancer. 2018; 18 (11): 696- 705.
CrossRef
Google scholar
|
[10] |
Guo H , Lv X , Li Y , Li M . Attention-based gcn integrates multi-omics data for breast cancer subtype classification and patient-specific gene marker identification. Brief Funct Genomics. 2023; 22 (5): 463- 74.
CrossRef
Google scholar
|
[11] |
Tamborero D , Gonzalez-Perez A , Lopez-Bigas N . Oncodriveclust: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics. 2013; 29 (18): 2238- 44.
CrossRef
Google scholar
|
[12] |
Lawrence MS , Stojanov P , Polak P , Kryukov GV , Cibulskis K , Sivachenko A , et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013; 499 (7457): 214- 8.
CrossRef
Google scholar
|
[13] |
Tokheim CJ , Papadopoulos N , Kinzler KW , Vogelstein B , Karchin R . Evaluating the evaluation of cancer driver genes. Proc Natl Acad Sci USA. 2016; 113 (50): 14330- 5.
CrossRef
Google scholar
|
[14] |
Cowen L , Ideker T , Raphael BJ , Sharan R . Network propagation: a universal amplifier of genetic associations. Nat Rev Genet. 2017; 18 (9): 551- 62.
CrossRef
Google scholar
|
[15] |
Page L , Brin S , Motwani R , Winograd T . The pagerank citation ranking: bringing order to the web; 1998; ID: 1508503.
|
[16] |
Leiserson MDM , Vandin F , Wu H-T , Dobson JR , Eldridge JV , Thomas JL , et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet. 2015; 47 (2): 106- 14.
CrossRef
Google scholar
|
[17] |
Perozzi B , Al-Rfou R , Skiena S . Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining; 2014. p. 701- 10.
CrossRef
Google scholar
|
[18] |
Zhang S-W , Xu J-Y , Zhang T . Dgmp: identifying cancer driver genes by jointing DGCN and MLP from multi-omics genomic data. Dev Reprod Biol. 2022; 20 (5): 928- 38.
CrossRef
Google scholar
|
[19] |
Schulte-Sasse R , Budach S , Hnisz D , Marsico A . Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nat Mach Intell. 2021; 3 (6): 513- 26.
CrossRef
Google scholar
|
[20] |
Pavić Z , Novoselac V . Notes on topsis method. Int J Res Eng Sci. 2013.
|
[21] |
Chen P . Effects of the entropy weight on topsis. Expert Syst Appl. 2021; 168: 114186.
CrossRef
Google scholar
|
[22] |
Shih H-S , Shyur H-J , Lee ES . An extension of topsis for group decision making. Math Comput Model. 2007; 45 (7-8): 801- 13.
CrossRef
Google scholar
|
[23] |
Shen J , Wu Z , Lei D , Shang J , Ren X , Han J . Setexpan: corpus-based set expansion via context feature selection and rank ensemble. In: Machine learning and knowledge discovery in databases. Springer International Publishing; 2017. p. 288- 304.
CrossRef
Google scholar
|
[24] |
Chen X , Liu X . A weighted bagging lightgbm model for potential lncrna-disease association identification. In: Bio-inspired computing: theories and applications. Springer Singapore; 2018. p. 307- 14.
CrossRef
Google scholar
|
[25] |
Collier O , Stoven V , Vert J-P . Lotus: a single- and multi-task machine learning algorithm for the prediction of cancer driver genes. PLoS Comput Biol. 2019; 15 (9): e1007381.
CrossRef
Google scholar
|
[26] |
Gumpinger AC , Lage K , Horn H , Borgwardt K . Prediction of cancer driver genes through network-based moment propagation of mutation scores. Bioinformatics. 2020; 36 (Suppl_1): 508- 15.
CrossRef
Google scholar
|
[27] |
Boyd K , Eng KH , Page CD . Area under the precision-recall curve: point estimates and confidence intervals. In: Machine learning and knowledge discovery in databases. Springer Berlin Heidelberg; 2013. p. 451- 66.
CrossRef
Google scholar
|
[28] |
Ziegler A , Koenig IR . Mining data with random forests: current options for real-world applications. Wiley Interdiscip Rev Data Min Knowl Discov. 2014; 4 (1): 55- 63.
CrossRef
Google scholar
|
[29] |
Bao W , Cui Q , Chen B , Yang B . Phage_unir_lgbm: phage virion proteins classification with unirep features and lightgbm model. Comput Math Methods Med. 2022; 2022: 1- 8.
CrossRef
Google scholar
|
[30] |
Huang S , Cai N , Pacheco PP , Narandes S , Wang Y , Xu W . Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics. 2018; 15: 41- 51.
CrossRef
Google scholar
|
[31] |
Kristensen VN , Lingjoerde OC , Russnes HG , Vollan HKM , Frigessi A , Borresen-Dale A-L . Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer. 2014; 14 (5): 299- 313.
CrossRef
Google scholar
|
[32] |
Xie C , Mao X , Huang J , Ding Y , Wu J , Dong S , et al. Kobas 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res. 2011; 39 (Suppl l_2): W316- 22.
CrossRef
Google scholar
|
[33] |
Ma T , Zhang A . Affinity network fusion and semi-supervised learning for cancer patient clustering. Methods. 2018; 145: 16- 24.
CrossRef
Google scholar
|
[34] |
Zhao W , Gu X , Chen S , Wu J , Zhou Z . Modig: integrating multi-omics and multi-dimensional gene network for cancer driver gene identification based on graph attention network model. Bioinformatics. 2022; 38 (21): 4901- 7.
CrossRef
Google scholar
|
[35] |
Shi X , Teng H , Shi L , Bi W , Wei W , Mao F , et al. Comprehensive evaluation of computational methods for predicting cancer driver genes. Briefings Bioinf. 2022; 23 (2): bbab548.
CrossRef
Google scholar
|
[36] |
Ren TY , Ye FF , Yang LH , Liu J , Wang Y . Dynamic rule activation method based on activation factor for extended belief rule-based systems. In: 2021 16th international conference on intelligent systems and knowledge engineering (ISKE); 2021. p. 82- 6.
CrossRef
Google scholar
|
[37] |
Wu H , Chen Z , Wu Y , Zhang H , Liu Q . Integrating protein-protein interaction networks and somatic mutation data to detect driver modules in pan-cancer. Interdiscipl Sci Comput Life Sci. 2022; 14 (1): 151- 67.
CrossRef
Google scholar
|
[38] |
Kamburov A , Pentchev K , Galicka H , Wierling C , Lehrach H , Herwig R . Consensuspathdb: toward a more complete picture of cell biology. Nucleic Acids Res. 2011; 39 (Suppl l_1): D712- 7.
CrossRef
Google scholar
|
[39] |
Szklarczyk D , Gable AL , Lyon D , Junge A , Wyder S , Huerta-Cepas J , et al. String v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019; 47 (D1): D607- 13.
CrossRef
Google scholar
|
[40] |
Khurana E , Fu Y , Chen J , Gerstein M . Interpretation of genomic variants using a unified biological network approach. PLoS Comput Biol. 2013; 9 (3): e1002886.
CrossRef
Google scholar
|
[41] |
Razick S , Magklaras G , Donaldson IM . Irefindex: a consolidated protein interaction database with provenance. BMC Bioinf. 2008; 9 (1): 405.
CrossRef
Google scholar
|
[42] |
Huang JK , Carlin DE , Yu MK , Zhang W , Kreisberg JF , Tamayo P , et al. Systematic evaluation of molecular networks for discovery of disease genes. Cell Systems. 2018; 6 (4): 484- 95.
CrossRef
Google scholar
|
[43] |
Wang Q , Armenia J , Zhang C , Penson AV , Reznik E , Zhang L , et al. Unifying cancer and normal rna sequencing data from different sources. Sci Data. 2018; 5 (1): 180061.
CrossRef
Google scholar
|
[44] |
Peng W , Wu R , Dai W , Ning Y , Fu X , Liu L , et al. Mirna-gene network embedding for predicting cancer driver genes. Brief Funct Genomics. 2023; 22 (4): 341- 50.
CrossRef
Google scholar
|
[45] |
McKusick VA . Mendelian inheritance in man and its online version, omim. Am J Hum Genet. 2007; 80 (4): 588- 604.
CrossRef
Google scholar
|
[46] |
Ogata H , Goto S , Sato K , Fujibuchi W , Bono H , Kanehisa M . Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999; 27 (1): 29- 34.
CrossRef
Google scholar
|
[47] |
Xiang J , Zhang N-R , Zhang J-S , Lv X-Y , Li M . Prgefne: predicting disease-related genes by fast network embedding. Methods. 2021; 192: 3- 12.
CrossRef
Google scholar
|
[48] |
Vanunu O , Magger O , Ruppin E , Shlomi T , Sharan R . Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010; 6 (1): e1000641.
CrossRef
Google scholar
|
[49] |
Li F , Gao L , Wang B . Detection of driver modules with rarely mutated genes in cancers. IEEE ACM Trans Comput Biol Bioinf. 2020; 17 (2): 390- 401.
CrossRef
Google scholar
|
[50] |
Zhang L-c , Li C-j , Yu Z-l . Dynamic web service selection group decision-making based on heterogeneous QOS models. J China Univ Posts Telecommun. 2012; 19 (3): 80- 90.
CrossRef
Google scholar
|
[51] |
Li Z , Luo Z , Wang Y , Fan G , Zhang J . Suitability evaluation system for the shallow geothermal energy implementation in region by entropy weight method and topsis method. Renew Energy. 2022; 184: 564- 76.
CrossRef
Google scholar
|
[52] |
Xu H , Zeng W , Zeng X , Yen GG . An evolutionary algorithm based on minkowski distance for many-objective optimization. IEEE Trans Cybern. 2019; 49 (11): 3968- 79.
CrossRef
Google scholar
|
[53] |
Chen T , Guestrin C . Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 785- 94.
CrossRef
Google scholar
|
[54] |
Rao H , Shi X , Rodrigue AK , Feng J , Xia Y , Elhoseny M , et al. Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput. 2019; 74: 634- 42.
CrossRef
Google scholar
|
[55] |
Borji A , Cheng M-M , Jiang H , Li J . Salient object detection: a benchmark. IEEE Trans Image Process. 2015; 24 (12): 5706- 22.
CrossRef
Google scholar
|
/
〈 |
|
〉 |