DeepM6ASeq-EL: prediction of human N6-methyladenosine (m 6A) sites with LSTM and ensemble learning

Juntao CHEN, Quan ZOU, Jing LI

PDF(7889 KB)
PDF(7889 KB)
Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (2) : 162302. DOI: 10.1007/s11704-020-0180-0
Artificial Intelligence
RESEARCH ARTICLE

DeepM6ASeq-EL: prediction of human N6-methyladenosine (m 6A) sites with LSTM and ensemble learning

Author information +
History +

Abstract

N6-methyladenosine (m 6A) is a prevalent methylation modification and plays a vital role in various biological processes, such as metabolism, mRNA processing, synthesis, and transport. Recent studies have suggested that m 6A modification is related to common diseases such as cancer, tumours, and obesity. Therefore, accurate prediction of methylation sites in RNA sequences has emerged as a critical issue in the area of bioinformatics. However, traditional high-throughput sequencing and wet bench experimental techniques have the disadvantages of high costs, significant time requirements and inaccurate identification of sites. But through the use of traditional experimental methods, researchers have produced many large databases of m 6A sites. With the support of these basic databases and existing deep learning methods, we developed an m 6A site predictor named DeepM6ASeq-EL, which integrates an ensemble of five LSTM and CNN classifiers with the combined strategy of hard voting. Compared to the state-of-the-art prediction method WHISTLE (average AUC 0.948 and 0.880), the DeepM6ASeq-EL had a lower accuracy in m 6A site prediction (average AUC: 0.861 for the full transcript models and 0.809 for the mature messenger RNA models) when tested on six independent datasets.

Graphical abstract

Keywords

N6-methyladenosine / site prediction / LSTM / CNN / ensemble learning

Cite this article

Download citation ▾
Juntao CHEN, Quan ZOU, Jing LI. DeepM6ASeq-EL: prediction of human N6-methyladenosine (m 6A) sites with LSTM and ensemble learning. Front. Comput. Sci., 2022, 16(2): 162302 https://doi.org/10.1007/s11704-020-0180-0

References

[1]
Dunn D , Smith J . Occurrence of a new base in the deoxyribonucleic acid of a strain of Bacterium coli. Nature, 1955, 175( 4451): 336– 337
CrossRef Google scholar
[2]
Adams J M , Cory S . Modified nucleosides and bizarre 5'-termini in mouse myeloma mRNA. Nature, 1975, 255( 5503): 28– 33
CrossRef Google scholar
[3]
Lichinchi G , Gao S , Saletore Y , Gonzalez G M , Bansal V , Wang Y , Mason C E , Rana T M . Dynamics of the human and viral m 6A RNA methylomes during HIV-1 infection of T cells. Nature Microbiology, 2016, 1( 4): 1– 9
[4]
Yin J , Sun W , Li F , Hong J , Li X , Zhou Y , Lu Y , Liu M , Zhang X , Chen N , Jin X , Xue J , Zeng S , Yu L , Zhu F . VARIDT 1.0: variability of drug transporter database. Nucleic Acids Res, 2020, 48( D1): D1042– D1050
CrossRef Google scholar
[5]
Meyer K D , Jaffrey S R . The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nature Reviews Molecular Cell Biology, 2014, 15( 5): 313– 326
CrossRef Google scholar
[6]
Tang J , Fu J , Wang Y , Luo Y , Yang Q , Li B , Tu G , Hong J , Cui X , Chen Y , Yao L , Xue W , Zhu F . Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains. Mol Cell Proteomics, 2019, 18( 8): 1683– 1699
CrossRef Google scholar
[7]
Cui Q , Shi H , Ye P , Li L , Qu Q , Sun G , Sun G , Lu Z , Huang Y , Yang C-G . m 6A RNA methylation regulates the self-renewal and tumorigenesis of glioblastoma stem cells. Cell Reports, 2017, 18( 11): 2622– 2634
CrossRef Google scholar
[8]
Jia G , Fu Y , Zhao X , Dai Q , Zheng G , Yang Y , Yi C , Lindahl T , Pan T , Yang Y-G . N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nature Chemical Biology, 2011, 7( 12): 885–
CrossRef Google scholar
[9]
Fang S , Pan J , Zhou C , Tian H , He J , Shen W , Jin X , Meng X , Jiang N , Gong Z . Circular RNAs serve as novel biomarkers and therapeutic targets in cancers. Curr Gene Ther, 2019, 19( 2): 125– 133
CrossRef Google scholar
[10]
Feng Y M . Gene therapy on the road. Curr Gene Ther, 2019, 19( 1): 6–
CrossRef Google scholar
[11]
Cheng L , Yang H , Zhao H , Pei X , Shi H , Sun J , Zhang Y , Wang Z , Zhou M . MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform, 2019, 20( 1): 203– 209
CrossRef Google scholar
[12]
Yang Q , Wang Y , Zhang Y , Li F , Xia W , Zhou Y , Qiu Y , Li H , Zhu F . NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data. Nucleic Acids Res, 2020,
[13]
Wang Y , Zhang S , Li F , Zhou Y , Zhang Y , Wang Z , Zhang R , Zhu J , Ren Y , Tan Y , Qin C , Li Y , Li X , Chen Y , Zhu F . Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res, 2020, 48( D1): D1031– D1041
[14]
Li B , Tang J , Yang Q , Li S , Cui X , Li Y , Chen Y , Xue W , Li X , Zhu F . NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res, 2017, 45( W1): W162– W170
CrossRef Google scholar
[15]
Liu B . BioSeq-Analysis: a platform for DNA, RNA, and protein sequence analysis based on machine learning approaches. Briefings in Bioinformatics, 2019, 20( 4): 1280– 1294
CrossRef Google scholar
[16]
Dominissini D , Moshitch-Moshkovitz S , Schwartz S , Salmon-Divon M , Ungar L , Osenberg S , Cesarkas K , Jacob-Hirsch J , Amariglio N , Kupiec M . Topology of the human and mouse m 6A RNA methylomes revealed by m 6A-seq. Nature, 2012, 485( 7397): 201– 206
CrossRef Google scholar
[17]
Meyer K D , Saletore Y , Zumbo P , Elemento O , Mason C E , Jaffrey S R . Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons. Cell, 2012, 149( 7): 1635– 1646
CrossRef Google scholar
[18]
Linder B , Grozhik A V , Olarerin-George A O , Meydan C , Mason C E , Jaffrey S R . Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nature Methods, 2015, 12( 8): 767– 772
CrossRef Google scholar
[19]
Li Y H , Li X X , Hong J J , Wang Y X , Fu J B , Yang H , Yu C Y , Li F C , Hu J , Xue W W , Jiang Y Y , Chen Y Z , Zhu F . Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs. Brief Bioinform, 2020, 21( 2): 649– 662
CrossRef Google scholar
[20]
Xue W , Yang F , Wang P , Zheng G , Chen Y , Yao X , Zhu F . What contributes to serotonin-norepinephrine reuptake inhibitors' dual-targeting mechanism? the key role of transmembrane domain 6 in human serotonin and norepinephrine transporters revealed by molecular dynamics simulation. ACS Chem Neurosci, 2018, 9( 5): 1128– 1140
CrossRef Google scholar
[21]
Chen W , Feng P , Ding H , Lin H , Chou K-C . iRNA-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition. Analytical Biochemistry, 2015, 490 : 26– 33
CrossRef Google scholar
[22]
Tang J , Fu J , Wang Y , Li B , Li Y , Yang Q , Cui X , Hong J , Li X , Chen Y , Xue W , Zhu F . ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief Bioinform, 2020, 21( 2): 621– 636
CrossRef Google scholar
[23]
Liu H , Wang H , Wei Z , Zhang S , Hua G , Zhang S-W , Zhang L , Gao S-J , Meng J , Chen X . MeT-DB V2.0: elucidating context-specific functions of N 6-methyl-adenosine methyltranscriptome. Nucleic Acids Research, 2018, 46( D1): D281– D287
CrossRef Google scholar
[24]
Xuan J-J , Sun W-J , Lin P-H , Zhou K-R , Liu S , Zheng L-L , Qu L-H , Yang J-H . RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Research, 2018, 46( D1): D327– D334
CrossRef Google scholar
[25]
Liu Z , Xiao X , Yu D-J , Jia J , Qiu W-R , Chou K-C . pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical–chemical properties. Analytical Biochemistry, 2016, 497 : 60– 67
CrossRef Google scholar
[26]
Zhang M , Sun J-W , Liu Z , Ren M-W , Shen H-B , Yu D-J . Improving N6-methyladenosine site prediction with heuristic selection of nucleotide physical–chemical properties. Analytical Biochemistry, 2016, 508 : 104– 113
CrossRef Google scholar
[27]
Zhou Y , Zeng P , Li Y-H , Zhang Z , Cui Q . SRAMP: prediction of mammalian N6-methyladenosine (m 6A) sites based on sequence-derived features. Nucleic Acids Research, 2016, 44( 10): e91– e91
CrossRef Google scholar
[28]
Chen W , Tang H , Lin H . MethyRNA: a web server for identification of N6-methyladenosine sites. Journal of Biomolecular Structure and Dynamics, 2017, 35( 3): 683– 687
CrossRef Google scholar
[29]
Fan C , Liu D , Huang R , Chen Z , Deng L . PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility. BMC Bioinformatics: BioMed Central, 2016, 8 :
[30]
Wang H , Liu C , Deng L . Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting. Scientific Reports, 2018, 8( 1): 14285–
CrossRef Google scholar
[31]
Deng L , Li W , Zhang J . LDAH2V: exploring meta-paths across multiple networks for lncRNA-disease association prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019,
[32]
Qiang X , Chen H , Ye X , Su R , Wei L . M6AMRFS: robust prediction of N6-methyladenosine sites with sequence-based features in multiple species. Frontiers in Genetics, 2018, 9 : 495–
CrossRef Google scholar
[33]
Wei L , Chen H , Su R . M6APred-EL: a sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning. Molecular Therapy-Nucleic Acids, 2018, 12 : 635– 644
CrossRef Google scholar
[34]
Zhang Y , Hamada M . DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning. BMC bioinformatics, 2018, 19( 19): 524–
[35]
Zou Q , Xing P , Wei L , Liu B . Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. Rna, 2019, 25( 2): 205– 218
CrossRef Google scholar
[36]
Chen K , Wei Z , Zhang Q , Wu X , Rong R , Lu Z , Su J , de Magalhaes J P , Rigden D J , Meng J . WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Research, 2019, 47( 7): e41– e41
CrossRef Google scholar
[37]
Liu K , Chen W . iMRM:a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics, 2020,
[38]
Vu L P , Pickering B F , Cheng Y , Zaccara S , Nguyen D , Minuesa G , Chou T , Chow A , Saletore Y , MacKay M . The N6-methyladenosine (m 6A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nature Medicine, 2017, 23( 11): 1369–
CrossRef Google scholar
[39]
Ke S , Alemu E A , Mertens C , Gantman E C , Fak J J , Mele A , Haripal B , Zucker-Scharff I , Moore M J , Park C Y . A majority of m6A residues are in the last exons, allowing the potential for 3' UTR regulation. Genes & Development, 2015, 29( 19): 2037– 2053
[40]
Ke S , Pandya-Jones A , Saito Y , Fak J J , Vågbø C B , Geula S , Hanna J H , Black D L , Darnell J E , Darnell R B . m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes & Development, 2017, 31( 10): 990– 1006
[41]
Dao F Y , Lv H , Zulfiqar H , Yang H , Su W , Gao H , Ding H , Lin H . A computational platform to identify origins of replication sites in eukaryotes. Brief Bioinform, 2020,
[42]
Lv H , Zhang Z M , Li S H , Tan J X , Chen W , Lin H . Evaluation of different computational methods on 5-methylcytosine sites identification. Briefings in Bioinformatics, 2019,
[43]
Li J W , Pu Y Q , Tang J J , Zou Q , Guo F . DeepAVP: a dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE Journal of Biomedical and Health Informatics, 2020, 1– 1
[44]
Hong J , Luo Y , Zhang Y , Ying J , Xue W , Xie T , Tao L , Zhu F . Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform, 2019,
[45]
Hong J , Luo Y , Mou M , Fu J , Zhang Y , Xue W , Xie T , Tao L , Lou Y , Zhu F . Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Brief Bioinform, 2019,
[46]
Li F , Zhou Y , Zhang X , Tang J , Yang Q , Zhang Y , Luo Y , Hu J , Xue W , Qiu Y , He Q , Yang B , Zhu F . SSizer: determining the sample sufficiency for comparative biological study. J Mol Biol, 2020,
[47]
Fang T , Zhang Z , Sun R , Zhu L , He J , Huang B , Xiong Y , Zhu X . RNAm5CPred: Prediction of RNA 5-methylcytosine sites based on three different kinds of nucleotide composition. Mol Ther Nucleic Acids, 2019, 18 : 739– 747
CrossRef Google scholar
[48]
He J , Fang T , Zhang Z , Huang B , Zhu X , Xiong Y . PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinformatics, 2018, 19( 1): 306–
CrossRef Google scholar
[49]
Liu B , Li C , Yan K . DeepSVM-fold: protein fold recognition by combining Support Vector Machines and pairwise sequence similarity scores generated by deep learning networks. Briefings in Bioinformatics, 2020, 21( 5): 1733– 1741
CrossRef Google scholar
[50]
Liu B , Gao X , Zhang H . BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA, and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Research, 2019, 47( 20): e127–
CrossRef Google scholar
[51]
Kundaje A , Meuleman W , Ernst J , Bilenky M , Yen A , Heravi-Moussavi A , Kheradpour P , Zhang Z , Wang J , Ziller M J . Integrative analysis of 111 reference human epigenomes. Nature, 2015, 518( 7539): 317– 330
CrossRef Google scholar
[52]
Lv H , Dao F Y , Zhang D , Guan Z X , Yang H , Su W , Liu M L , Ding H , Chen W , Lin H . iDNA-MS: an integrated computational tool for detecting dna modification sites in multiple genomes. iScience, 2020, 23( 4): 100991–
CrossRef Google scholar
[53]
Wei L , Zou Q , Liao M , Lu H , Zhao Y . A novel machine learning method for cytokine-receptor interaction prediction. Combinatorial Chemistry & High Throughput Screening, 2016, 19( 2): 144– 152
[54]
Wu B , Zhang H , Lin L , Wang H , Gao Y , Zhao L , Chen Y-P P , Chen R , Gu L . A similarity searching system for biological phenotype images using deep convolutional encoder-decoder architecture. Current Bioinformatics, 2019, 14( 7): 628– 639
CrossRef Google scholar
[55]
Lv Z B , Ao C Y , Zou Q . Protein function prediction: from traditional classifier to deep learning. Proteomics, 2019, 19( 14): 2–
[56]
Zhang J , Zhong B N , Wang P F , Wang C , Du J X . Robust feature learning for online discriminative tracking without large-scale pre-training. Frontiers of Computer Science, 2018, 12( 6): 1160– 1172
CrossRef Google scholar
[57]
Zhang Q J , Zhang L . Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Frontiers of Computer Science, 2018, 12( 6): 1140– 1148
CrossRef Google scholar
[58]
Zheng N , Wang K , Zhan W , Deng L . Targeting virus-host protein interactions: feature extraction and machine learning approaches. Current Drug Metabolism, 2019, 20( 3): 177– 184
CrossRef Google scholar
[59]
Liu S , Liu C , Deng L . Machine learning approaches for protein–protein interaction hot spot prediction: progress and comparative assessment. Molecules, 2018, 23( 10): 2535–
CrossRef Google scholar
[60]
Liu B , Zhu Y . ProtDec-LTR3.0: protein remote homology detection by incorporating profile-based features into Learning to Rank. IEEE Access, 2019, 7 : 102499– 102507
CrossRef Google scholar
[61]
Zhang M , Li F , Marquez-Lago T T , Leier A , Fan C , Kwoh C K , Chou K-C , Song J , Jia C . MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics, 2019, 35( 17): 2957– 2965
CrossRef Google scholar
[62]
Yang W , Zhu X J , Huang J , Ding H , Lin H . A brief survey of machine learning methods in protein sub-Golgi localization. Current Bioinformatics, 2019, 14 : 234– 240
CrossRef Google scholar
[63]
Liu M L , Su W , Guan Z X , Zhang D , Chen W , Liu L , Ding H . An overview on predicting protein subchloroplast localization by using machine learning methods. Curr Protein Pept Sci, 2020,
[64]
Cheng L , Jiang Y , Ju H , Sun J , Peng J , Zhou M , Hu Y . InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics, 2018, 19( Suppl 1): 919–
[65]
Ding Y , Tang J , Guo F . Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 2019, 325 : 211– 224
CrossRef Google scholar
[66]
Zhu X , He J , Zhao S , Tao W , Xiong Y , Bi S . A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics, 2019, 18( 6): 367– 376
[67]
Shan X , Wang X , Li C D , Chu Y , Zhang Y , Xiong Y , Wei D Q . Prediction of CYP450 enzyme-substrate selectivity based on the network-based label space division method. J Chem Inf Model, 2019, 59( 11): 4577– 4586
CrossRef Google scholar
[68]
Chu Y , Kaushik A C , Wang X , Wang W , Zhang Y , Shan X , Salahub D R , Xiong Y , Wei D Q . DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform, 2019,
[69]
Xu Q , Xiong Y , Dai H , Kumari K M , Xu Q , Ou H Y , Wei D Q . PDC-SGB: prediction of effective drug combinations using a stochastic gradient boosting algorithm. J Theor Biol, 2017, 417 : 1– 7
CrossRef Google scholar
[70]
Wei H , Liu B . iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Briefings In Bioinformatics, ,
CrossRef Google scholar
[71]
Jia C , Zuo Y , Zou Q . O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics, 2018, 34( 12): 2029– 2036
CrossRef Google scholar
[72]
Chen W , Feng P , Liu T , Jin D . Recent advances in machine learning methods for predicting heat shock proteins. Curr Drug Metab, 2019, 20( 3): 224– 228
CrossRef Google scholar
[73]
Wang H , Ding Y , Tang J , Guo F . Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt independence criterion. Neurocomputing, 2020, 383 : 257– 269
CrossRef Google scholar
[74]
Wang B , Lu K , Zheng X , Su B , Zhou Y , Chen P , Zhang J . Early stage identification of Alzheimer’s disease using a two-stage ensemble classifier. Current Bioinformatics, 2018, 13( 5): 529– 535
CrossRef Google scholar
[75]
Li J , Wei L , Guo F , Zou Q . EP3: an ensemble predictor that accurately identifies type III secreted effectors. Briefings in Bioinformatics, 2021, 22( 2): 1918– 1928
CrossRef Google scholar
[76]
Ru X , Cao P , Li L , Zou Q . Selecting essential micrornas using a novel voting method. Molecular Therapy - Nucleic Acids, 2019, 18 : 16– 23
CrossRef Google scholar
[77]
Dong X B , Yu Z W , Cao W M , Shi Y F , Ma Q L . A survey on ensemble learning. Frontiers of Computer Science, 2020, 14( 2): 241– 258
CrossRef Google scholar
[78]
He Y Z , Alem E E , Wang W . Hybritus: a password strength checker by ensemble learning from the query feedbacks of websites. Frontiers of Computer Science, 2020, 14( 3): 14–

Acknowledgements

The work was supported by the National Natural Science Foundation of China (Grant Nos. 61922020, 61771331, 91935302).

RIGHTS & PERMISSIONS

2022 Higher Education Press
AI Summary AI Mindmap
PDF(7889 KB)

Accesses

Citations

Detail

Sections
Recommended

/