DeepM6ASeq-EL: prediction of human N6-methyladenosine (m 6A) sites with LSTM and ensemble learning

Juntao CHEN , Quan ZOU , Jing LI

Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (2) : 162302

PDF (7889KB)
Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (2) : 162302 DOI: 10.1007/s11704-020-0180-0
Artificial Intelligence
RESEARCH ARTICLE

DeepM6ASeq-EL: prediction of human N6-methyladenosine (m 6A) sites with LSTM and ensemble learning

Author information +
History +
PDF (7889KB)

Abstract

N6-methyladenosine (m 6A) is a prevalent methylation modification and plays a vital role in various biological processes, such as metabolism, mRNA processing, synthesis, and transport. Recent studies have suggested that m 6A modification is related to common diseases such as cancer, tumours, and obesity. Therefore, accurate prediction of methylation sites in RNA sequences has emerged as a critical issue in the area of bioinformatics. However, traditional high-throughput sequencing and wet bench experimental techniques have the disadvantages of high costs, significant time requirements and inaccurate identification of sites. But through the use of traditional experimental methods, researchers have produced many large databases of m 6A sites. With the support of these basic databases and existing deep learning methods, we developed an m 6A site predictor named DeepM6ASeq-EL, which integrates an ensemble of five LSTM and CNN classifiers with the combined strategy of hard voting. Compared to the state-of-the-art prediction method WHISTLE (average AUC 0.948 and 0.880), the DeepM6ASeq-EL had a lower accuracy in m 6A site prediction (average AUC: 0.861 for the full transcript models and 0.809 for the mature messenger RNA models) when tested on six independent datasets.

Graphical abstract

Keywords

N6-methyladenosine / site prediction / LSTM / CNN / ensemble learning

Cite this article

Download citation ▾
Juntao CHEN, Quan ZOU, Jing LI. DeepM6ASeq-EL: prediction of human N6-methyladenosine (m 6A) sites with LSTM and ensemble learning. Front. Comput. Sci., 2022, 16(2): 162302 DOI:10.1007/s11704-020-0180-0

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Dunn D , Smith J . Occurrence of a new base in the deoxyribonucleic acid of a strain of Bacterium coli. Nature, 1955, 175( 4451): 336– 337

[2]

Adams J M , Cory S . Modified nucleosides and bizarre 5'-termini in mouse myeloma mRNA. Nature, 1975, 255( 5503): 28– 33

[3]

Lichinchi G , Gao S , Saletore Y , Gonzalez G M , Bansal V , Wang Y , Mason C E , Rana T M . Dynamics of the human and viral m 6A RNA methylomes during HIV-1 infection of T cells. Nature Microbiology, 2016, 1( 4): 1– 9

[4]

Yin J , Sun W , Li F , Hong J , Li X , Zhou Y , Lu Y , Liu M , Zhang X , Chen N , Jin X , Xue J , Zeng S , Yu L , Zhu F . VARIDT 1.0: variability of drug transporter database. Nucleic Acids Res, 2020, 48( D1): D1042– D1050

[5]

Meyer K D , Jaffrey S R . The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nature Reviews Molecular Cell Biology, 2014, 15( 5): 313– 326

[6]

Tang J , Fu J , Wang Y , Luo Y , Yang Q , Li B , Tu G , Hong J , Cui X , Chen Y , Yao L , Xue W , Zhu F . Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains. Mol Cell Proteomics, 2019, 18( 8): 1683– 1699

[7]

Cui Q , Shi H , Ye P , Li L , Qu Q , Sun G , Sun G , Lu Z , Huang Y , Yang C-G . m 6A RNA methylation regulates the self-renewal and tumorigenesis of glioblastoma stem cells. Cell Reports, 2017, 18( 11): 2622– 2634

[8]

Jia G , Fu Y , Zhao X , Dai Q , Zheng G , Yang Y , Yi C , Lindahl T , Pan T , Yang Y-G . N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nature Chemical Biology, 2011, 7( 12): 885–

[9]

Fang S , Pan J , Zhou C , Tian H , He J , Shen W , Jin X , Meng X , Jiang N , Gong Z . Circular RNAs serve as novel biomarkers and therapeutic targets in cancers. Curr Gene Ther, 2019, 19( 2): 125– 133

[10]

Feng Y M . Gene therapy on the road. Curr Gene Ther, 2019, 19( 1): 6–

[11]

Cheng L , Yang H , Zhao H , Pei X , Shi H , Sun J , Zhang Y , Wang Z , Zhou M . MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform, 2019, 20( 1): 203– 209

[12]

Yang Q , Wang Y , Zhang Y , Li F , Xia W , Zhou Y , Qiu Y , Li H , Zhu F . NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data. Nucleic Acids Res, 2020,

[13]

Wang Y , Zhang S , Li F , Zhou Y , Zhang Y , Wang Z , Zhang R , Zhu J , Ren Y , Tan Y , Qin C , Li Y , Li X , Chen Y , Zhu F . Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res, 2020, 48( D1): D1031– D1041

[14]

Li B , Tang J , Yang Q , Li S , Cui X , Li Y , Chen Y , Xue W , Li X , Zhu F . NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res, 2017, 45( W1): W162– W170

[15]

Liu B . BioSeq-Analysis: a platform for DNA, RNA, and protein sequence analysis based on machine learning approaches. Briefings in Bioinformatics, 2019, 20( 4): 1280– 1294

[16]

Dominissini D , Moshitch-Moshkovitz S , Schwartz S , Salmon-Divon M , Ungar L , Osenberg S , Cesarkas K , Jacob-Hirsch J , Amariglio N , Kupiec M . Topology of the human and mouse m 6A RNA methylomes revealed by m 6A-seq. Nature, 2012, 485( 7397): 201– 206

[17]

Meyer K D , Saletore Y , Zumbo P , Elemento O , Mason C E , Jaffrey S R . Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons. Cell, 2012, 149( 7): 1635– 1646

[18]

Linder B , Grozhik A V , Olarerin-George A O , Meydan C , Mason C E , Jaffrey S R . Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nature Methods, 2015, 12( 8): 767– 772

[19]

Li Y H , Li X X , Hong J J , Wang Y X , Fu J B , Yang H , Yu C Y , Li F C , Hu J , Xue W W , Jiang Y Y , Chen Y Z , Zhu F . Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs. Brief Bioinform, 2020, 21( 2): 649– 662

[20]

Xue W , Yang F , Wang P , Zheng G , Chen Y , Yao X , Zhu F . What contributes to serotonin-norepinephrine reuptake inhibitors' dual-targeting mechanism? the key role of transmembrane domain 6 in human serotonin and norepinephrine transporters revealed by molecular dynamics simulation. ACS Chem Neurosci, 2018, 9( 5): 1128– 1140

[21]

Chen W , Feng P , Ding H , Lin H , Chou K-C . iRNA-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition. Analytical Biochemistry, 2015, 490 : 26– 33

[22]

Tang J , Fu J , Wang Y , Li B , Li Y , Yang Q , Cui X , Hong J , Li X , Chen Y , Xue W , Zhu F . ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief Bioinform, 2020, 21( 2): 621– 636

[23]

Liu H , Wang H , Wei Z , Zhang S , Hua G , Zhang S-W , Zhang L , Gao S-J , Meng J , Chen X . MeT-DB V2.0: elucidating context-specific functions of N 6-methyl-adenosine methyltranscriptome. Nucleic Acids Research, 2018, 46( D1): D281– D287

[24]

Xuan J-J , Sun W-J , Lin P-H , Zhou K-R , Liu S , Zheng L-L , Qu L-H , Yang J-H . RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Research, 2018, 46( D1): D327– D334

[25]

Liu Z , Xiao X , Yu D-J , Jia J , Qiu W-R , Chou K-C . pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical–chemical properties. Analytical Biochemistry, 2016, 497 : 60– 67

[26]

Zhang M , Sun J-W , Liu Z , Ren M-W , Shen H-B , Yu D-J . Improving N6-methyladenosine site prediction with heuristic selection of nucleotide physical–chemical properties. Analytical Biochemistry, 2016, 508 : 104– 113

[27]

Zhou Y , Zeng P , Li Y-H , Zhang Z , Cui Q . SRAMP: prediction of mammalian N6-methyladenosine (m 6A) sites based on sequence-derived features. Nucleic Acids Research, 2016, 44( 10): e91– e91

[28]

Chen W , Tang H , Lin H . MethyRNA: a web server for identification of N6-methyladenosine sites. Journal of Biomolecular Structure and Dynamics, 2017, 35( 3): 683– 687

[29]

Fan C , Liu D , Huang R , Chen Z , Deng L . PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility. BMC Bioinformatics: BioMed Central, 2016, 8 :

[30]

Wang H , Liu C , Deng L . Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting. Scientific Reports, 2018, 8( 1): 14285–

[31]

Deng L , Li W , Zhang J . LDAH2V: exploring meta-paths across multiple networks for lncRNA-disease association prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019,

[32]

Qiang X , Chen H , Ye X , Su R , Wei L . M6AMRFS: robust prediction of N6-methyladenosine sites with sequence-based features in multiple species. Frontiers in Genetics, 2018, 9 : 495–

[33]

Wei L , Chen H , Su R . M6APred-EL: a sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning. Molecular Therapy-Nucleic Acids, 2018, 12 : 635– 644

[34]

Zhang Y , Hamada M . DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning. BMC bioinformatics, 2018, 19( 19): 524–

[35]

Zou Q , Xing P , Wei L , Liu B . Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. Rna, 2019, 25( 2): 205– 218

[36]

Chen K , Wei Z , Zhang Q , Wu X , Rong R , Lu Z , Su J , de Magalhaes J P , Rigden D J , Meng J . WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Research, 2019, 47( 7): e41– e41

[37]

Liu K , Chen W . iMRM:a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics, 2020,

[38]

Vu L P , Pickering B F , Cheng Y , Zaccara S , Nguyen D , Minuesa G , Chou T , Chow A , Saletore Y , MacKay M . The N6-methyladenosine (m 6A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nature Medicine, 2017, 23( 11): 1369–

[39]

Ke S , Alemu E A , Mertens C , Gantman E C , Fak J J , Mele A , Haripal B , Zucker-Scharff I , Moore M J , Park C Y . A majority of m6A residues are in the last exons, allowing the potential for 3' UTR regulation. Genes & Development, 2015, 29( 19): 2037– 2053

[40]

Ke S , Pandya-Jones A , Saito Y , Fak J J , Vågbø C B , Geula S , Hanna J H , Black D L , Darnell J E , Darnell R B . m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes & Development, 2017, 31( 10): 990– 1006

[41]

Dao F Y , Lv H , Zulfiqar H , Yang H , Su W , Gao H , Ding H , Lin H . A computational platform to identify origins of replication sites in eukaryotes. Brief Bioinform, 2020,

[42]

Lv H , Zhang Z M , Li S H , Tan J X , Chen W , Lin H . Evaluation of different computational methods on 5-methylcytosine sites identification. Briefings in Bioinformatics, 2019,

[43]

Li J W , Pu Y Q , Tang J J , Zou Q , Guo F . DeepAVP: a dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE Journal of Biomedical and Health Informatics, 2020, 1– 1

[44]

Hong J , Luo Y , Zhang Y , Ying J , Xue W , Xie T , Tao L , Zhu F . Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform, 2019,

[45]

Hong J , Luo Y , Mou M , Fu J , Zhang Y , Xue W , Xie T , Tao L , Lou Y , Zhu F . Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Brief Bioinform, 2019,

[46]

Li F , Zhou Y , Zhang X , Tang J , Yang Q , Zhang Y , Luo Y , Hu J , Xue W , Qiu Y , He Q , Yang B , Zhu F . SSizer: determining the sample sufficiency for comparative biological study. J Mol Biol, 2020,

[47]

Fang T , Zhang Z , Sun R , Zhu L , He J , Huang B , Xiong Y , Zhu X . RNAm5CPred: Prediction of RNA 5-methylcytosine sites based on three different kinds of nucleotide composition. Mol Ther Nucleic Acids, 2019, 18 : 739– 747

[48]

He J , Fang T , Zhang Z , Huang B , Zhu X , Xiong Y . PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinformatics, 2018, 19( 1): 306–

[49]

Liu B , Li C , Yan K . DeepSVM-fold: protein fold recognition by combining Support Vector Machines and pairwise sequence similarity scores generated by deep learning networks. Briefings in Bioinformatics, 2020, 21( 5): 1733– 1741

[50]

Liu B , Gao X , Zhang H . BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA, and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Research, 2019, 47( 20): e127–

[51]

Kundaje A , Meuleman W , Ernst J , Bilenky M , Yen A , Heravi-Moussavi A , Kheradpour P , Zhang Z , Wang J , Ziller M J . Integrative analysis of 111 reference human epigenomes. Nature, 2015, 518( 7539): 317– 330

[52]

Lv H , Dao F Y , Zhang D , Guan Z X , Yang H , Su W , Liu M L , Ding H , Chen W , Lin H . iDNA-MS: an integrated computational tool for detecting dna modification sites in multiple genomes. iScience, 2020, 23( 4): 100991–

[53]

Wei L , Zou Q , Liao M , Lu H , Zhao Y . A novel machine learning method for cytokine-receptor interaction prediction. Combinatorial Chemistry & High Throughput Screening, 2016, 19( 2): 144– 152

[54]

Wu B , Zhang H , Lin L , Wang H , Gao Y , Zhao L , Chen Y-P P , Chen R , Gu L . A similarity searching system for biological phenotype images using deep convolutional encoder-decoder architecture. Current Bioinformatics, 2019, 14( 7): 628– 639

[55]

Lv Z B , Ao C Y , Zou Q . Protein function prediction: from traditional classifier to deep learning. Proteomics, 2019, 19( 14): 2–

[56]

Zhang J , Zhong B N , Wang P F , Wang C , Du J X . Robust feature learning for online discriminative tracking without large-scale pre-training. Frontiers of Computer Science, 2018, 12( 6): 1160– 1172

[57]

Zhang Q J , Zhang L . Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Frontiers of Computer Science, 2018, 12( 6): 1140– 1148

[58]

Zheng N , Wang K , Zhan W , Deng L . Targeting virus-host protein interactions: feature extraction and machine learning approaches. Current Drug Metabolism, 2019, 20( 3): 177– 184

[59]

Liu S , Liu C , Deng L . Machine learning approaches for protein–protein interaction hot spot prediction: progress and comparative assessment. Molecules, 2018, 23( 10): 2535–

[60]

Liu B , Zhu Y . ProtDec-LTR3.0: protein remote homology detection by incorporating profile-based features into Learning to Rank. IEEE Access, 2019, 7 : 102499– 102507

[61]

Zhang M , Li F , Marquez-Lago T T , Leier A , Fan C , Kwoh C K , Chou K-C , Song J , Jia C . MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics, 2019, 35( 17): 2957– 2965

[62]

Yang W , Zhu X J , Huang J , Ding H , Lin H . A brief survey of machine learning methods in protein sub-Golgi localization. Current Bioinformatics, 2019, 14 : 234– 240

[63]

Liu M L , Su W , Guan Z X , Zhang D , Chen W , Liu L , Ding H . An overview on predicting protein subchloroplast localization by using machine learning methods. Curr Protein Pept Sci, 2020,

[64]

Cheng L , Jiang Y , Ju H , Sun J , Peng J , Zhou M , Hu Y . InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics, 2018, 19( Suppl 1): 919–

[65]

Ding Y , Tang J , Guo F . Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 2019, 325 : 211– 224

[66]

Zhu X , He J , Zhao S , Tao W , Xiong Y , Bi S . A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics, 2019, 18( 6): 367– 376

[67]

Shan X , Wang X , Li C D , Chu Y , Zhang Y , Xiong Y , Wei D Q . Prediction of CYP450 enzyme-substrate selectivity based on the network-based label space division method. J Chem Inf Model, 2019, 59( 11): 4577– 4586

[68]

Chu Y , Kaushik A C , Wang X , Wang W , Zhang Y , Shan X , Salahub D R , Xiong Y , Wei D Q . DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform, 2019,

[69]

Xu Q , Xiong Y , Dai H , Kumari K M , Xu Q , Ou H Y , Wei D Q . PDC-SGB: prediction of effective drug combinations using a stochastic gradient boosting algorithm. J Theor Biol, 2017, 417 : 1– 7

[70]

Wei H , Liu B . iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Briefings In Bioinformatics, ,

[71]

Jia C , Zuo Y , Zou Q . O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics, 2018, 34( 12): 2029– 2036

[72]

Chen W , Feng P , Liu T , Jin D . Recent advances in machine learning methods for predicting heat shock proteins. Curr Drug Metab, 2019, 20( 3): 224– 228

[73]

Wang H , Ding Y , Tang J , Guo F . Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt independence criterion. Neurocomputing, 2020, 383 : 257– 269

[74]

Wang B , Lu K , Zheng X , Su B , Zhou Y , Chen P , Zhang J . Early stage identification of Alzheimer’s disease using a two-stage ensemble classifier. Current Bioinformatics, 2018, 13( 5): 529– 535

[75]

Li J , Wei L , Guo F , Zou Q . EP3: an ensemble predictor that accurately identifies type III secreted effectors. Briefings in Bioinformatics, 2021, 22( 2): 1918– 1928

[76]

Ru X , Cao P , Li L , Zou Q . Selecting essential micrornas using a novel voting method. Molecular Therapy - Nucleic Acids, 2019, 18 : 16– 23

[77]

Dong X B , Yu Z W , Cao W M , Shi Y F , Ma Q L . A survey on ensemble learning. Frontiers of Computer Science, 2020, 14( 2): 241– 258

[78]

He Y Z , Alem E E , Wang W . Hybritus: a password strength checker by ensemble learning from the query feedbacks of websites. Frontiers of Computer Science, 2020, 14( 3): 14–

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (7889KB)

Supplementary files

Highlights

3489

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/