Modeling the relationship between gene expression and mutational signature
Limin Jiang, Hui Yu, Yan Guo
Modeling the relationship between gene expression and mutational signature
Background: Mutational signatures computed from somatic mutations, allow an in-depth understanding of tumorigenesis and may illuminate early prevention strategies. Many studies have shown the regulation effects between somatic mutation and gene expression dysregulation.
Methods: We hypothesized that there are potential associations between mutational signature and gene expression. We capitalized upon RNA-seq data to model 49 established mutational signatures in 33 cancer types. Both accuracy and area under the curve were used as performance measures in five-fold cross-validation.
Results: A total of 475 models using unconstrained genes, and 112 models using protein-coding genes were selected for future inference purposes. An independent gene expression dataset on lung cancer smoking status was used for validation which achieved over 80% for both accuracy and area under the curve.
Conclusion: These results demonstrate that the associations between gene expression and somatic mutations can translate into the associations between gene expression and mutational signatures.
To overcome the limitations of non-negative matrix factorization in the situation of sparse mutation, a method was designed in this paper to predict mutational signatures based on RNA-seq data. This method first was used to build the associations between gene expression and 49 established mutational signatures. Then, a total of 587 successful models covering 31 cancer types were obtained based on the condition of accuracies and AUCs (Area-Under-Curve) are both greater than 0.8. Finally, all successful models were assembled to form an online tool (EMSI) as a component of the MutEx analysis suite, and models can be visited at the website of innovebioinfo.
mutational signature / gene expression / support vector machine / random forest / extreme gradient boost
[1] |
Bergstrom,E. N., Huang,M. N., Mahto,U., Barnes,M., Stratton,M. R., Rozen,S. G. Alexandrov,L. (2019). SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics, 20: 685
CrossRef
Google scholar
|
[2] |
Alexandrov,L. Nik-Zainal,S., Wedge,D. C., Campbell,P. J., Stratton,M. (2013). Deciphering signatures of mutational processes operative in human cancer. Cell Rep., 3: 246–259
CrossRef
Google scholar
|
[3] |
Petljak,M., Alexandrov,L. B., Brammeld,J. S., Price,S., Wedge,D. C., Grossmann,S., Dawson,K. J., Ju,Y. S., Iorio,F., Tubio,J. M. C.
CrossRef
Google scholar
|
[4] |
Alexandrov,L. B., Nik-Zainal,S., Wedge,D. C., Aparicio,S. A., Behjati,S., Biankin,A. V., Bignell,G. R., Bolli,N., Borg,A., rresen-Dale,A. L.
CrossRef
Google scholar
|
[5] |
Alexandrov,L. B., Nik-Zainal,S., Wedge,D. C., Campbell,P. J. Stratton,M. (2013). Deciphering signatures of mutational processes operative in human cancer. Cell Rep., 3: 246–259
CrossRef
Google scholar
|
[6] |
Shinde,J., Renault,V., Couchy,G., Blanc,J. F., Tubacher,E., Bayard,Q., Bacq,D., Meyer,V., Semhoun,J.
CrossRef
Google scholar
|
[7] |
Polak,P., Kim,J., Braunstein,L. Z., Karlic,R., Haradhavala,N. J., Tiao,G., Rosebrock,D., Livitz,D., bler,K., Mouw,K. W.
CrossRef
Google scholar
|
[8] |
Petljak,M. Alexandrov,L. (2016). Understanding mutagenesis through delineation of mutational signatures in human cancer. Carcinogenesis, 37: 531–540
CrossRef
Google scholar
|
[9] |
Alexandrov,L. B., Ju,Y. S., Haase,K., Van Loo,P., Martincorena,I., Nik-Zainal,S., Totoki,Y., Fujimoto,A., Nakagawa,H., Shibata,T.
CrossRef
Google scholar
|
[10] |
Alexandrov,L. B., Jones,P. H., Wedge,D. C., Sale,J. E., Campbell,P. J., Nik-Zainal,S. Stratton,M. (2015). Clock-like mutational processes in human somatic cells. Nat. Genet., 47: 1402–1407
CrossRef
Google scholar
|
[11] |
Kucab,J. E., Zou,X., Morganella,S., Joel,M., Nanda,A. S., Nagy,E., Gomez,C., Degasperi,A., Harris,R., Jackson,S. P.
CrossRef
Google scholar
|
[12] |
Gulhan,D. C., Lee,J. J. Melloni,G. E. M., s-Ciriano,I. Park,P. (2019). Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat. Genet., 51: 912–919
CrossRef
Google scholar
|
[13] |
Masica,D. L. (2011). Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival. Cancer Res., 71: 4550–4561
CrossRef
Google scholar
|
[14] |
Ping,J., Oyebamiji,O., Yu,H., Ness,S., Chien,J., Ye,F., Kang,H., Samuels,D., Ivanov,S., Chen,D.
CrossRef
Google scholar
|
[15] |
Wang,X., Sun,Q., Chen,C., Yin,R., Huang,X., Wang,X., Shi,R., Xu,L. (2016). ZYG11A serves as an oncogene in non-small cell lung cancer and influences CCNE1 expression. Oncotarget, 7: 8029–8042
CrossRef
Google scholar
|
[16] |
Shen,D. J., Jiang,Y. H., Li,J. Q., Xu,L. W. Tao,K. (2020). The RNA-binding protein RBM47 inhibits non-small cell lung carcinoma metastasis through modulation of AXIN1 mRNA stability and Wnt/β-catentin signaling. Surg. Oncol., 34: 31–39
CrossRef
Google scholar
|
[17] |
Zhang,H., Chen,X., Wang,J., Guang,W., Han,W., Zhang,H., Tan,X. (2014). EGR1 decreases the malignancy of human non-small cell lung carcinoma by regulating KRT18 expression. Sci. Rep., 4: 5416
CrossRef
Google scholar
|
[18] |
Inman,G. J., Wang,J., Nagano,A., Alexandrov,L. B., Purdie,K. J., Taylor,R. G., Sherwood,V., Thomson,J., Hogan,S., Spender,L. C.
CrossRef
Google scholar
|
[19] |
Ng,A. W. T., Poon,S. L., Huang,M. N., Lim,J. Q., Boot,A., Yu,W., Suzuki,Y., Thangaraju,S., Ng,C. C. Y., Tan,P.
|
[20] |
Davies,H., Glodzik,D., Morganella,S., Yates,L. R., Staaf,J., Zou,X., Ramakrishna,M., Martin,S., Boyault,S., Sieuwerts,A. M.
CrossRef
Google scholar
|
[21] |
Alexandrov,L. B., Nik-Zainal,S., Siu,H. C., Leung,S. Y. Stratton,M. (2015). A mutational signature in gastric cancer suggests therapeutic strategies. Nat. Commun., 6: 8683
CrossRef
Google scholar
|
[22] |
Meijer,T. G., Verkaik,N. S., Sieuwerts,A. M., van Riet,J., Naipal,K. A. T., van Deurzen,C. H. M., den Bakker,M. A., Sleddens,H. F. B. M., Dubbink,H. J., den Toom,T. D.
CrossRef
Google scholar
|
[23] |
Waddell,N., Pajic,M., Patch,A. M., Chang,D. K., Kassahn,K. S., Bailey,P., Johns,A. L., Miller,D., Nones,K., Quek,K.
CrossRef
Google scholar
|
[24] |
Morganella,S., Alexandrov,L. B., Glodzik,D., Zou,X., Davies,H., Staaf,J., Sieuwerts,A. M., Brinkman,A. B., Martin,S., Ramakrishna,M.
CrossRef
Google scholar
|
[25] |
Haradhvala,N. J., Kim,J., Maruvka,Y. E., Polak,P., Rosebrock,D., Livitz,D., Hess,J. M., Leshchiner,I., Kamburov,A., Mouw,K. W.
CrossRef
Google scholar
|
[26] |
ShengQ.,Samuels D. C.,YuH.,NessS.,ZhaoY. Y.. (2020) Cancer-specific expression quantitative loci are affected by expression dysregulation. Brief. Bioinform, 21, 338−347.
|
[27] |
Ye,B., Shi,J., Kang,H., Oyebamiji,O., Hill,D., Yu,H., Ness,S., Ye,F., Ping,J., He,J.
CrossRef
Google scholar
|
[28] |
Georganos,S., Grippa,T., Vanhuysse,S., Lennert,M., Shimoni,M. (2018). Very high resolution object-based land use-land cover urban classification using extreme gradient boosting. IEEE Geosci. Remote Sens. Lett., 15: 607–611
CrossRef
Google scholar
|
[29] |
Manikandaprabhu,P. (2016). Unified RF-SVM model based digital radiography classification for Inferior Alveolar Nerve Injury (IANI) identification. Biomed Res-India, 27: 1107–1117
|
[30] |
Shao,Y. Lunetta,R. (2012). Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS J. Photogramm. Remote Sens., 70: 78–87
CrossRef
Google scholar
|
[31] |
Blokzijl,F., Janssen,R., van Boxtel,R. (2018). MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med., 10: 33
CrossRef
Google scholar
|
[32] |
Thorsson,V., Gibbs,D. L., Brown,S. D., Wolf,D., Bortone,D. S. Yang,O. T. H., Porta-Pardo,E., Gao,G. F., Plaisier,C. L., Eddy,J. A.
|
AESA | Advanced expression survival analysis |
ACC | Adrenocortical carcinoma |
BLCA | Bladder urothelial carcinoma |
BRCA | Breast invasive carcinoma |
CESC | Cervical squamous cell carcinoma and endocervical adenocarcinoma |
CHOL | Cholangiocarcinoma |
COAD | Colon adenocarcinoma |
DLBC | Lymphoid neoplasm diffuse large B-cell lymphoma |
eQTL | gene expression quantitative trait loci |
ESCA | Esophageal carcinoma |
GBM | Glioblastoma multiforme |
HNSC | Head and neck squamous cell carcinoma |
KICH | Kidney chromophobe |
KIRC | Kidney renal clear cell carcinoma |
KIRP | Kidney renal papillary cell carcinoma |
LAML | Acute myeloid leukemia |
LGG | Brain lower grade glioma |
LIHC | Liver hepatocellular carcinoma |
LUAD | Lung adenocarcinoma |
LUSC | Lung squamous cell carcinoma |
MESO | Mesothelioma |
OV | Ovarian serous cystadenocarcinoma |
PAAD | Pancreatic adenocarcinoma |
PCPG | Pheochromocytoma and paraganglioma |
PRAD | Prostate adenocarcinoma |
READ | Rectum adenocarcinoma |
RF | Random forest |
ROC | Receiver operating characteristics |
SARC | Sarcoma |
SBS | Single base substitutions |
SKCM | Skin cutaneous melanoma |
STAD | Stomach adenocarcinoma |
SVM | Support vector machine |
TCGA | The Cancer Genome Atlas |
TGCT | Testicular germ cell tumors |
THCA | Thyroid carcinoma |
THYM | Thymoma |
UCEC | Uterine corpus endometrial carcinoma |
UCS | Uterine carcinosarcoma |
UV | Ultraviolet |
UVM | Uveal melanoma |
XGBoost | EXtreme Gradient Boosting |
/
〈 | 〉 |