piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm
Zaheer Ullah KHAN, Dechang PI, Shuanglong YAO, Asif NAWAZ, Farman ALI, Shaukat ALI
piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm
Enhancers are short DNA cis-elements that can be bound by proteins (activators) to increase the possibility that transcription of a particular gene will occur. The Enhancers perform a significant role in the formation of proteins and regulating the gene transcription process. Human diseases such as cancer, inflammatory bowel disease, Parkinson’s, addiction, and schizophrenia are due to genetic variation in enhancers. In the current study, we havemade an effort by building, amore robust and novel computational a bi-layered model. The representative feature vector was constructed over a linear combination of six features. The optimum Hybrid feature vector was obtained via the Novel Cascade Multi-Level Subset Feature selection (CMSFS) algorithm. The first layer predicts the enhancer, and the secondary layer carries the prediction of their subtypes. The baseline model obtained 87.88% of accuracy, 95.29% of sensitivity, 80.47% of specificity, 0.766 of MCC, and 0.9603 of a roc value on Layer-1. Similarly, the model obtained 68.24%, 65.54%, 70.95%, 0.3654, and 0.7568 as an Accuracy, sensitivity, specificity, MCC, and ROC values on layer-2 respectively. Over an independent dataset on layer-1, the piEnPred secured 80.4% accuracy, 82.5% of sensitivity, 78.4% of specificity, and 0.6099 as MCC, respectively. Subsequently, the proposed predictor obtained 72.5% of accuracy, 70.0% of sensitivity, 75% of specificity, and 0.4506 of MCC on layer-2, respectively. The proposed method remarkably performed in contrast to other state-of-the-art predictors. For the convenience of most experimental scientists, a user-friendly and publicly freely accessible web server@/bienhancer dot pythonanywhere dot com has been developed.
enhancer / enhancer types / novel CM-SFS algorithm / feature selection / SVM
[1] |
Blackwood E M, Kadonaga J T. Going the distance: a current view of enhancer action. Science, 1998, 281(5373): 60–63
CrossRef
Google scholar
|
[2] |
Roeder R G. The role of general initiation factors in transcription by RNA polymerase II. Trends in Biochemical Sciences, 1996, 21(9): 327–335
CrossRef
Google scholar
|
[3] |
Nikolov D B, Burley S K. RNA polymerase II transcription initiation: a structural view. Proceedings of the National Academy of Sciences, 1997, 94(1): 15–22
CrossRef
Google scholar
|
[4] |
Lee T I, Young R A. Transcription of eukaryotic protein-coding genes. Annual Review of Genetics, 2000, 34(1): 77–137
CrossRef
Google scholar
|
[5] |
Pennacchio L A, Bickmore W, Dean A, Nobrega M A, Bejerano G. Enhancers: five essential questions. Nature Reviews Genetics, 2013, 14(4): 288–295
CrossRef
Google scholar
|
[6] |
Kulaeva O I, Nizovtseva E V, Polikanov Y S, Ulianov S V, Studitsky V M. Distant activation of transcription: mechanisms of enhancer action. Molecular and Cellular Biology, 2012, 32(24): 4892–4897
CrossRef
Google scholar
|
[7] |
Civas A, Génin P, Morin P, Lin R, Hiscott J. Promoter organization of the interferon-A genes differentially affects virus-induced expression and responsiveness to TBK1 and IKKc. Journal of Biological Chemistry, 2006, 281(8): 4856–486
CrossRef
Google scholar
|
[8] |
Sharan R, Karni S, Felder Y. Analysis of biological networks: transcriptional networks-promoter sequence analysis. Tel Aviv University, 2007, 1–5
|
[9] |
Li M, Marin-Muller C, Bharadwaj U, Chow K H, Yao Q, Chen C. MicroRNAs: control and loss of control in human physiology and disease. World Journal of Surgery, 2009, 33(4): 667–684
CrossRef
Google scholar
|
[10] |
Ong C T, Corces V G. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nature Reviews Genetics, 2011, 12(4): 283–293
CrossRef
Google scholar
|
[11] |
Wittkopp P J, Kalay G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nature Reviews Genetics, 2012, 13(1): 59–69
CrossRef
Google scholar
|
[12] |
Gagniuc P, Ionescu-Tirgoviste C. Gene promoters show chromosomespecificity and reveal chromosome territories in humans. BMC Genomics, 2013, 14(1): 1–13
CrossRef
Google scholar
|
[13] |
Corradin O, Scacheri P C. Enhancer variants: evaluating functions in common disease. Genome Medicine, 2014, 6(10): 1–4
CrossRef
Google scholar
|
[14] |
Boyd M, Thodberg M, Vitezic M, Bornholdt J, Vitting-Seerup K, Chen Y, Coskun M, Li Y, Lo B Z S, Klausen P. Characterization of the enhancer and promoter landscape of inflammatory bowel disease from human colon biopsies. Nature Communications, 2018, 9(1): 1–9
CrossRef
Google scholar
|
[15] |
Herz H. Enhancer deregulation in cancer and other diseases. BioEssays, 2016, 38(10): 1003–1015
CrossRef
Google scholar
|
[16] |
Zhang G, Shi J, Zhu S, Lan Y, Xu L, Yuan H, Liao G, Liu X, Zhang Y, Xiao Y. DiseaseEnhancer: a resource of human disease-associated enhancer catalog. Nucleic Acids Research, 2017, 46(D1): D78–D84
CrossRef
Google scholar
|
[17] |
Whyte W A, Orlando D A, Hnisz D, Abraham B J, Lin C Y, Kagey M H, Rahl P B, Lee T I, Young R A. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell, 2013, 153(2): 307–319
CrossRef
Google scholar
|
[18] |
Parker S C, Stitzel M L, Taylor D L, Orozco J M, Erdos M R, Akiyama J A, van Bueren K L, Chines P S, Narisu N, Black B L, Visel A. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proceedings of the National Academy of Sciences, 2013, 110(44): 17921–17926
CrossRef
Google scholar
|
[19] |
Chatterjee B, Banoth B, Mukherjee T, Taye N, Vijayaragavan B, Chattopadhyay S, Gomes J, Basak S. Late-phase synthesis of IKBαinsulates the TLR4-activated canonical NF-KB pathway from noncanonical NF-KB signaling in macrophages. Science Signaling, 2016, 9(457): ra120–ra120
CrossRef
Google scholar
|
[20] |
Niederriter A R, Varshney A, Parker S C, Martin D M. Super enhancers in cancers, complex disease, and developmental disorders. Genes, 2015, 6(4): 1183–1200
CrossRef
Google scholar
|
[21] |
Schmidt S F, Larsen B D, Loft A, Nielsen R, Madsen J G S, Mandrup S. Acute TNF-induced repression of cell identity genes is mediated by NFKB-directed redistribution of cofactors from super-enhancers. Genome Research, 2015, 25(9): 1281–1294
CrossRef
Google scholar
|
[22] |
Vahedi G, Kanno Y, Furumoto Y, Jiang K, Parker S C J, Erdos MR, Davis S R, Roychoudhuri R, Restifo N P, Gadina M. Super-enhancers delineate disease-associated regulatory nodes in T cells. Nature, 2015, 520(7548): 558–562
CrossRef
Google scholar
|
[23] |
Brown J D, Lin C Y, Duan Q, Griffin G, Federation A J, Paranal R M, Bair S, Newton G, Lichtman A H, Kung A L. NF-KB directs dynamic super enhancer formation in inflammation and atherogenesis. Molecular Cell, 2014, 56(2): 219–231
CrossRef
Google scholar
|
[24] |
Vlahopoulos S A, Cen O, Hengen N, Agan J, Moschovi M, Critselis E, Adamaki M, Bacopoulou F, Copland J A, Boldogh I. Dynamic aberrant NF-KB spurs tumorigenesis: a new model encompassing the microenvironment. Cytokine & Growth Factor Reviews, 2015, 26(4): 389–403
CrossRef
Google scholar
|
[25] |
Zou Z, Huang B, Wu X, Zhang H, Qi J, Bradner J, Nair S, Chen L F. Brd4 maintains constitutively active NF-KB in cancer cells by binding to acetylated RelA. Oncogene, 2014, 33(18): 2395–2404
CrossRef
Google scholar
|
[26] |
Shlyueva D, Stampfel G, Stark A. Transcriptional enhancers: from properties to genome-wide predictions. Nature Reviews Genetics, 2014, 15(4): 272–286
CrossRef
Google scholar
|
[27] |
Tahir M, Hayat M, Khan S A. A two-layer computational model for discrimination of enhancer and their types using hybrid features pace of pseudo k-tuple nucleotide composition. Arabian Journal for Science and Engineering, 2018, 43(12): 6719–6727
CrossRef
Google scholar
|
[28] |
Visel A, Blow M J, Li Z, Zhang T, Akiyama J A, Holt A, Plajzer-Frick I, Shoukry M, Wright C, Chen F. ChIP-seq accurately predicts tissuespecific activity of enhancers. Nature, 2009, 457(7231): 854–858
CrossRef
Google scholar
|
[29] |
Visel A, Prabhakar S, Akiyama J A, Shoukry M, Lewis K D, Holt A, Plajzer-Frick I, Afzal V, Rubin E M, Pennacchio L A. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nature Genetics, 2008, 40(2): 158–160
CrossRef
Google scholar
|
[30] |
Kulakovskiy I V,Medvedeva Y A, Schaefer U, Kasianov A S, Vorontsov I E, Bajic V B, Makeev V J. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Research, 2012, 41(D1): 195–202
CrossRef
Google scholar
|
[31] |
Bryne J C, Valen E, Tang M H E, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Research, 2007, 36(suppl_1): 102–106
CrossRef
Google scholar
|
[32] |
Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods, 2012, 9(3): 215–216
CrossRef
Google scholar
|
[33] |
Hoffman M M,Buske O J, Wang J, Weng Z, Bilmes J A, Noble W S. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods, 2012, 9(5): 473–480
CrossRef
Google scholar
|
[34] |
Firpi H A, Ucar D, Tan K. Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics, 2010, 26(13): 1579–1586
CrossRef
Google scholar
|
[35] |
Rajagopal N, Xie W, Li Y, Wagner U, Wang W, Stamatoyannopoulos J, Ernst J, Kellis M, Ren B. RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Computational Biology, 2013, 9(3): e1002968
CrossRef
Google scholar
|
[36] |
Erwin G D, Oksenberg N, Truty R M, Kostka D, Murphy K K, Ahituv N, Pollard K S, Capra J A. Integrating diverse datasets improves developmental enhancer prediction. PLoS Computational Biology, 2014, 10(6): e1003677
CrossRef
Google scholar
|
[37] |
Lu Y, Qu W, Shan G, Zhang C. DELTA: a distal enhancer locating tool based on AdaBoost algorithm and shape features of chromatin modifications. PLoS ONE, 2015, 10(6): e0130622
CrossRef
Google scholar
|
[38] |
Bu H, Gan Y, Wang Y, Zhou S, Guan J. A new method for enhancer prediction based on deep belief network. BMC Bioinformatics, 2017, 18(12): 418–430
CrossRef
Google scholar
|
[39] |
Yang B, Liu F, Ren C, Ouyang Z, Xie Z, Bo X, Shu W. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics, 2017, 33(13): 1930–1936
CrossRef
Google scholar
|
[40] |
Kleftogiannis D, Kalnis P, Bajic V B. DEEP: a general computational framework for predicting enhancers. Nucleic Acids Research, 2014, 43(1): e6–e6
CrossRef
Google scholar
|
[41] |
Shao J, Xu D, Tsai S N, Wang Y, Ngai S M. Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS ONE, 2009, 4(3): e4920
CrossRef
Google scholar
|
[42] |
Chen W, Lei T Y, Jin D C, Lin H, Chou K C. PseKNC: a flexible web server for generating pseudo k-tuple nucleotide composition. Analytical Biochemistry, 2014, 456(1): 53–60
CrossRef
Google scholar
|
[43] |
Jia C, He W. EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Scientific Reports, 2016, 6: 38741
CrossRef
Google scholar
|
[44] |
Liu B, Fang L, Long R, Lan X, Chou K C. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics, 2015, 32(3): 362–369
CrossRef
Google scholar
|
[45] |
Liu B, Li K, Huang D S, Chou K C. iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics, 2018, 34(22): 3835– 3842
CrossRef
Google scholar
|
[46] |
Le N Q K, Yapp E K Y, Ho Q T, Nagasundaram N, Ou Y Y, Yeh H Y. iEnhancer-5Step: identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Analytical Biochemistry, 2019, 571: 53–61
CrossRef
Google scholar
|
[47] |
Zeng X, Yuan S, Huang X, Zou Q. Identification of cytokine via an improved genetic algorithm. Frontiers of Computer Science, 2015, 9(4): 643–651
CrossRef
Google scholar
|
[48] |
Zhao W, Wang L, Zhang T X, Zhao Z N, Du P F. A brief review on software tools in generating Chou’s pseudo-factor representations for all types of biological sequences. Protein and Peptide Letters, 2018, 25(9): 822–829
CrossRef
Google scholar
|
[49] |
Akbar S, Hayat M, Iqbal M, Tahir M. iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition. Frontiers of Computer Science, 2020, 14(2): 451–460
CrossRef
Google scholar
|
[50] |
Ali F, Hayat M. Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition. Journal of Theoretical Biology, 2015, 384: 78–83
CrossRef
Google scholar
|
[51] |
LiW A, Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006, 22(13): 1658–1659
CrossRef
Google scholar
|
[52] |
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012, 28(23): 3150–3152
CrossRef
Google scholar
|
[53] |
Liu B, Liu Y, Huang D. Recombination hotspot/coldspot identification combining three different pseudocomponents via an ensemble learning approach. BioMed Research International, 2016, 10(1): 100–120
CrossRef
Google scholar
|
[54] |
Khan Z U, Ali F, Ahmad I, Hayat M, Pi D. iPred CNC: computational prediction model for cancerlectins and non-cancerlectins using novel cascade features subset selection. Chemometrics and Intelligent Laboratory Systems, 2019, 195: 103876
CrossRef
Google scholar
|
[55] |
Chen Z, Zhao P, Li F, Marquez-Lago T T, Leier A, Revote J, Zhu Y, Powell D R, Akutsu T, Webb G I, Chou K C. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Briefings in Bioinformatics, 2020, 21(3): 1047–1057
CrossRef
Google scholar
|
[56] |
Chen Z, Zhao P, Li F, Leier A, Marquez-Lago T T, Wang Y, Webb G I, Smith A I, Daly R J, Chou K C. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018, 34(14): 2499–2502
CrossRef
Google scholar
|
[57] |
Zhang S, Zhuang W, Xu Z. Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components. Analytical Biochemistry, 2018, 549: 149–156
CrossRef
Google scholar
|
[58] |
Chen W, Ding H, Zhou X, Lin H, Chou K C. iRNA(m6A)-PseDNC: identifying N6-methyladenosine sites using pseudo dinucleotide composition. Analytical Biochemistry, 2018, 561: 59–65
CrossRef
Google scholar
|
[59] |
Chen W, Feng P M, Lin H, Chou K C. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Research, 2013, 41(6): e68–e74
CrossRef
Google scholar
|
[60] |
Khan Z U, Ali F, Khan I A, Hussain Y, Pi D. iRSpot-SPI: deep learningbased recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou’s 5-step rule and pseudo components. Chemometrics and Intelligent Laboratory Systems, 2019, 189: 169–180
CrossRef
Google scholar
|
[61] |
Lin H, Deng E Z, Ding H, Chen W, Chou K C. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Research, 2014, 42(21): 12961–12972
CrossRef
Google scholar
|
[62] |
Feng P, Yang H, Ding H, Lin H, Chen W, Chou K C. iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2019, 111(1): 96–102
CrossRef
Google scholar
|
[63] |
Yang H, Qiu W R, Liu G, Guo F B, Chen W, Chou K C, Lin H. iRSpot- Pse6NC: identifying recombination spots in saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. International Journal of Biological Sciences, 2018, 14(8): 883
CrossRef
Google scholar
|
[64] |
Khan Z U, Hayat M, Khan M A. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. Journal of Theoretical Biology, 2015, 365: 197–203
CrossRef
Google scholar
|
[65] |
Ali F, Kabir M, Arif M, Khan Swati Z N, Khan Z U, Ullah M, Yu D J. DBPPred-PDSD: machine learning approach for prediction of DNAbinding proteins using Discrete Wavelet Transform and optimized integrated features space. Chemometrics and Intelligent Laboratory Systems, 2018, 182: 21–30
CrossRef
Google scholar
|
[66] |
Hayat M, Khan A. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. Journal of Theoretical Biology, 2011, 271(1): 10–17
CrossRef
Google scholar
|
[67] |
Chou K C, Shen H B. Recent progress in protein subcellular location prediction. Analytical Biochemistry, 2007, 370(1): 1–16
CrossRef
Google scholar
|
[68] |
Gheyas I A, Smith L S. Feature subset selection in large dimensionality domains. Pattern Recognition, 2010, 43(1): 5–13
CrossRef
Google scholar
|
[69] |
Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1–2): 273–324
CrossRef
Google scholar
|
[70] |
Chokka A, Sandhua Rani K. AdaBoost with feature selection using IoT to bring the paths for somatic mutations evaluation in cancer. In: Internet of Things and Personalized Healthcare Systems. Springer, Singapore, 2019, 51–63
CrossRef
Google scholar
|
[71] |
Maldonado S, Weber R. A wrapper method for feature selection using Support Vector Machines. Information Sciences, 2009, 179(13): 2208–2217
CrossRef
Google scholar
|
[72] |
Das S. Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 74–81
|
[73] |
Hsu H H, Hsieh CW, Lu M D. Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications, 2011, 38(7): 8144–8150
CrossRef
Google scholar
|
[74] |
Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering, 2014, 40(1): 16–28
CrossRef
Google scholar
|
[75] |
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8):1226–1238
CrossRef
Google scholar
|
[76] |
Yang R, Zhang C, Zhang L, Gao R. A two-step feature selection method to predict cancerlectins by multiview features and synthetic minority oversampling technique. BioMed Research International, 2018, 2018(1): 1–10
CrossRef
Google scholar
|
[77] |
AL-barakati H J, McConnell E W, Hicks L M, Poole L B, Newman R H. SVM-SulfoSite: a support vector machine based predictor for sulfenylation sites. Scientific Reports, 2018, 8(1): 11288
CrossRef
Google scholar
|
[78] |
Ding Y, Wilkins D. Improving the performance of SVM-RFE to select genes in microarray data. BMC Bioinformatics, 2006, 7(2): S12
CrossRef
Google scholar
|
[79] |
Javed F, Hayat M. Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou’s PseAAC. Genomics, 2019, 111(6): 1325–1332
CrossRef
Google scholar
|
[80] |
Liu B, Liu Y, Jin X, Wang X, Liu B. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Scientific Reports, 2016, 6(1): 1–9
CrossRef
Google scholar
|
[81] |
Jia C, Zuo Y. S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. Journal of Theoretical Biology, 2017, 422: 84–89
CrossRef
Google scholar
|
[82] |
Chou K C. Some remarks on predicting multi-label attributes in molecular biosystems. Molecular Biosystems, 2013, 9: 1092–1100
CrossRef
Google scholar
|
[83] |
Chou K C. Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 2011, 273(1): 236–247
CrossRef
Google scholar
|
[84] |
Liu B, Wang S, Long R, Chou K C. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics, 2017, 33(1):35–41
CrossRef
Google scholar
|
[85] |
Tahir M, Tayara H, Chong K T. iRNA-PseKNC (2methyl): identify RNA 2’-o-methylation sites by convolution neural network and chou’s pseudo components. Journal of Theoretical Biology, 2019, 465: 1–6
CrossRef
Google scholar
|
[86] |
Tayara H, Tahir M, Chong K T. Identification of prokaryotic promoters and their strength by integrating heterogeneous features. Genomics, 2020, 112(2): 13S96–1403
CrossRef
Google scholar
|
/
〈 | 〉 |