piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm

Zaheer Ullah KHAN, Dechang PI, Shuanglong YAO, Asif NAWAZ, Farman ALI, Shaukat ALI

PDF(722 KB)
PDF(722 KB)
Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (6) : 156904. DOI: 10.1007/s11704-020-9504-3
RESEARCH ARTICLE

piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm

Author information +
History +

Abstract

Enhancers are short DNA cis-elements that can be bound by proteins (activators) to increase the possibility that transcription of a particular gene will occur. The Enhancers perform a significant role in the formation of proteins and regulating the gene transcription process. Human diseases such as cancer, inflammatory bowel disease, Parkinson’s, addiction, and schizophrenia are due to genetic variation in enhancers. In the current study, we havemade an effort by building, amore robust and novel computational a bi-layered model. The representative feature vector was constructed over a linear combination of six features. The optimum Hybrid feature vector was obtained via the Novel Cascade Multi-Level Subset Feature selection (CMSFS) algorithm. The first layer predicts the enhancer, and the secondary layer carries the prediction of their subtypes. The baseline model obtained 87.88% of accuracy, 95.29% of sensitivity, 80.47% of specificity, 0.766 of MCC, and 0.9603 of a roc value on Layer-1. Similarly, the model obtained 68.24%, 65.54%, 70.95%, 0.3654, and 0.7568 as an Accuracy, sensitivity, specificity, MCC, and ROC values on layer-2 respectively. Over an independent dataset on layer-1, the piEnPred secured 80.4% accuracy, 82.5% of sensitivity, 78.4% of specificity, and 0.6099 as MCC, respectively. Subsequently, the proposed predictor obtained 72.5% of accuracy, 70.0% of sensitivity, 75% of specificity, and 0.4506 of MCC on layer-2, respectively. The proposed method remarkably performed in contrast to other state-of-the-art predictors. For the convenience of most experimental scientists, a user-friendly and publicly freely accessible web server@/bienhancer dot pythonanywhere dot com has been developed.

Keywords

enhancer / enhancer types / novel CM-SFS algorithm / feature selection / SVM

Cite this article

Download citation ▾
Zaheer Ullah KHAN, Dechang PI, Shuanglong YAO, Asif NAWAZ, Farman ALI, Shaukat ALI. piEnPred: a bi-layered discriminative model for enhancers and their subtypes via novel cascade multi-level subset feature selection algorithm. Front. Comput. Sci., 2021, 15(6): 156904 https://doi.org/10.1007/s11704-020-9504-3

References

[1]
Blackwood E M, Kadonaga J T. Going the distance: a current view of enhancer action. Science, 1998, 281(5373): 60–63
CrossRef Google scholar
[2]
Roeder R G. The role of general initiation factors in transcription by RNA polymerase II. Trends in Biochemical Sciences, 1996, 21(9): 327–335
CrossRef Google scholar
[3]
Nikolov D B, Burley S K. RNA polymerase II transcription initiation: a structural view. Proceedings of the National Academy of Sciences, 1997, 94(1): 15–22
CrossRef Google scholar
[4]
Lee T I, Young R A. Transcription of eukaryotic protein-coding genes. Annual Review of Genetics, 2000, 34(1): 77–137
CrossRef Google scholar
[5]
Pennacchio L A, Bickmore W, Dean A, Nobrega M A, Bejerano G. Enhancers: five essential questions. Nature Reviews Genetics, 2013, 14(4): 288–295
CrossRef Google scholar
[6]
Kulaeva O I, Nizovtseva E V, Polikanov Y S, Ulianov S V, Studitsky V M. Distant activation of transcription: mechanisms of enhancer action. Molecular and Cellular Biology, 2012, 32(24): 4892–4897
CrossRef Google scholar
[7]
Civas A, Génin P, Morin P, Lin R, Hiscott J. Promoter organization of the interferon-A genes differentially affects virus-induced expression and responsiveness to TBK1 and IKKc. Journal of Biological Chemistry, 2006, 281(8): 4856–486
CrossRef Google scholar
[8]
Sharan R, Karni S, Felder Y. Analysis of biological networks: transcriptional networks-promoter sequence analysis. Tel Aviv University, 2007, 1–5
[9]
Li M, Marin-Muller C, Bharadwaj U, Chow K H, Yao Q, Chen C. MicroRNAs: control and loss of control in human physiology and disease. World Journal of Surgery, 2009, 33(4): 667–684
CrossRef Google scholar
[10]
Ong C T, Corces V G. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nature Reviews Genetics, 2011, 12(4): 283–293
CrossRef Google scholar
[11]
Wittkopp P J, Kalay G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nature Reviews Genetics, 2012, 13(1): 59–69
CrossRef Google scholar
[12]
Gagniuc P, Ionescu-Tirgoviste C. Gene promoters show chromosomespecificity and reveal chromosome territories in humans. BMC Genomics, 2013, 14(1): 1–13
CrossRef Google scholar
[13]
Corradin O, Scacheri P C. Enhancer variants: evaluating functions in common disease. Genome Medicine, 2014, 6(10): 1–4
CrossRef Google scholar
[14]
Boyd M, Thodberg M, Vitezic M, Bornholdt J, Vitting-Seerup K, Chen Y, Coskun M, Li Y, Lo B Z S, Klausen P. Characterization of the enhancer and promoter landscape of inflammatory bowel disease from human colon biopsies. Nature Communications, 2018, 9(1): 1–9
CrossRef Google scholar
[15]
Herz H. Enhancer deregulation in cancer and other diseases. BioEssays, 2016, 38(10): 1003–1015
CrossRef Google scholar
[16]
Zhang G, Shi J, Zhu S, Lan Y, Xu L, Yuan H, Liao G, Liu X, Zhang Y, Xiao Y. DiseaseEnhancer: a resource of human disease-associated enhancer catalog. Nucleic Acids Research, 2017, 46(D1): D78–D84
CrossRef Google scholar
[17]
Whyte W A, Orlando D A, Hnisz D, Abraham B J, Lin C Y, Kagey M H, Rahl P B, Lee T I, Young R A. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell, 2013, 153(2): 307–319
CrossRef Google scholar
[18]
Parker S C, Stitzel M L, Taylor D L, Orozco J M, Erdos M R, Akiyama J A, van Bueren K L, Chines P S, Narisu N, Black B L, Visel A. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proceedings of the National Academy of Sciences, 2013, 110(44): 17921–17926
CrossRef Google scholar
[19]
Chatterjee B, Banoth B, Mukherjee T, Taye N, Vijayaragavan B, Chattopadhyay S, Gomes J, Basak S. Late-phase synthesis of IKBαinsulates the TLR4-activated canonical NF-KB pathway from noncanonical NF-KB signaling in macrophages. Science Signaling, 2016, 9(457): ra120–ra120
CrossRef Google scholar
[20]
Niederriter A R, Varshney A, Parker S C, Martin D M. Super enhancers in cancers, complex disease, and developmental disorders. Genes, 2015, 6(4): 1183–1200
CrossRef Google scholar
[21]
Schmidt S F, Larsen B D, Loft A, Nielsen R, Madsen J G S, Mandrup S. Acute TNF-induced repression of cell identity genes is mediated by NFKB-directed redistribution of cofactors from super-enhancers. Genome Research, 2015, 25(9): 1281–1294
CrossRef Google scholar
[22]
Vahedi G, Kanno Y, Furumoto Y, Jiang K, Parker S C J, Erdos MR, Davis S R, Roychoudhuri R, Restifo N P, Gadina M. Super-enhancers delineate disease-associated regulatory nodes in T cells. Nature, 2015, 520(7548): 558–562
CrossRef Google scholar
[23]
Brown J D, Lin C Y, Duan Q, Griffin G, Federation A J, Paranal R M, Bair S, Newton G, Lichtman A H, Kung A L. NF-KB directs dynamic super enhancer formation in inflammation and atherogenesis. Molecular Cell, 2014, 56(2): 219–231
CrossRef Google scholar
[24]
Vlahopoulos S A, Cen O, Hengen N, Agan J, Moschovi M, Critselis E, Adamaki M, Bacopoulou F, Copland J A, Boldogh I. Dynamic aberrant NF-KB spurs tumorigenesis: a new model encompassing the microenvironment. Cytokine & Growth Factor Reviews, 2015, 26(4): 389–403
CrossRef Google scholar
[25]
Zou Z, Huang B, Wu X, Zhang H, Qi J, Bradner J, Nair S, Chen L F. Brd4 maintains constitutively active NF-KB in cancer cells by binding to acetylated RelA. Oncogene, 2014, 33(18): 2395–2404
CrossRef Google scholar
[26]
Shlyueva D, Stampfel G, Stark A. Transcriptional enhancers: from properties to genome-wide predictions. Nature Reviews Genetics, 2014, 15(4): 272–286
CrossRef Google scholar
[27]
Tahir M, Hayat M, Khan S A. A two-layer computational model for discrimination of enhancer and their types using hybrid features pace of pseudo k-tuple nucleotide composition. Arabian Journal for Science and Engineering, 2018, 43(12): 6719–6727
CrossRef Google scholar
[28]
Visel A, Blow M J, Li Z, Zhang T, Akiyama J A, Holt A, Plajzer-Frick I, Shoukry M, Wright C, Chen F. ChIP-seq accurately predicts tissuespecific activity of enhancers. Nature, 2009, 457(7231): 854–858
CrossRef Google scholar
[29]
Visel A, Prabhakar S, Akiyama J A, Shoukry M, Lewis K D, Holt A, Plajzer-Frick I, Afzal V, Rubin E M, Pennacchio L A. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nature Genetics, 2008, 40(2): 158–160
CrossRef Google scholar
[30]
Kulakovskiy I V,Medvedeva Y A, Schaefer U, Kasianov A S, Vorontsov I E, Bajic V B, Makeev V J. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Research, 2012, 41(D1): 195–202
CrossRef Google scholar
[31]
Bryne J C, Valen E, Tang M H E, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Research, 2007, 36(suppl_1): 102–106
CrossRef Google scholar
[32]
Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nature Methods, 2012, 9(3): 215–216
CrossRef Google scholar
[33]
Hoffman M M,Buske O J, Wang J, Weng Z, Bilmes J A, Noble W S. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods, 2012, 9(5): 473–480
CrossRef Google scholar
[34]
Firpi H A, Ucar D, Tan K. Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics, 2010, 26(13): 1579–1586
CrossRef Google scholar
[35]
Rajagopal N, Xie W, Li Y, Wagner U, Wang W, Stamatoyannopoulos J, Ernst J, Kellis M, Ren B. RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Computational Biology, 2013, 9(3): e1002968
CrossRef Google scholar
[36]
Erwin G D, Oksenberg N, Truty R M, Kostka D, Murphy K K, Ahituv N, Pollard K S, Capra J A. Integrating diverse datasets improves developmental enhancer prediction. PLoS Computational Biology, 2014, 10(6): e1003677
CrossRef Google scholar
[37]
Lu Y, Qu W, Shan G, Zhang C. DELTA: a distal enhancer locating tool based on AdaBoost algorithm and shape features of chromatin modifications. PLoS ONE, 2015, 10(6): e0130622
CrossRef Google scholar
[38]
Bu H, Gan Y, Wang Y, Zhou S, Guan J. A new method for enhancer prediction based on deep belief network. BMC Bioinformatics, 2017, 18(12): 418–430
CrossRef Google scholar
[39]
Yang B, Liu F, Ren C, Ouyang Z, Xie Z, Bo X, Shu W. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics, 2017, 33(13): 1930–1936
CrossRef Google scholar
[40]
Kleftogiannis D, Kalnis P, Bajic V B. DEEP: a general computational framework for predicting enhancers. Nucleic Acids Research, 2014, 43(1): e6–e6
CrossRef Google scholar
[41]
Shao J, Xu D, Tsai S N, Wang Y, Ngai S M. Computational identification of protein methylation sites through bi-profile Bayes feature extraction. PLoS ONE, 2009, 4(3): e4920
CrossRef Google scholar
[42]
Chen W, Lei T Y, Jin D C, Lin H, Chou K C. PseKNC: a flexible web server for generating pseudo k-tuple nucleotide composition. Analytical Biochemistry, 2014, 456(1): 53–60
CrossRef Google scholar
[43]
Jia C, He W. EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Scientific Reports, 2016, 6: 38741
CrossRef Google scholar
[44]
Liu B, Fang L, Long R, Lan X, Chou K C. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics, 2015, 32(3): 362–369
CrossRef Google scholar
[45]
Liu B, Li K, Huang D S, Chou K C. iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics, 2018, 34(22): 3835– 3842
CrossRef Google scholar
[46]
Le N Q K, Yapp E K Y, Ho Q T, Nagasundaram N, Ou Y Y, Yeh H Y. iEnhancer-5Step: identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Analytical Biochemistry, 2019, 571: 53–61
CrossRef Google scholar
[47]
Zeng X, Yuan S, Huang X, Zou Q. Identification of cytokine via an improved genetic algorithm. Frontiers of Computer Science, 2015, 9(4): 643–651
CrossRef Google scholar
[48]
Zhao W, Wang L, Zhang T X, Zhao Z N, Du P F. A brief review on software tools in generating Chou’s pseudo-factor representations for all types of biological sequences. Protein and Peptide Letters, 2018, 25(9): 822–829
CrossRef Google scholar
[49]
Akbar S, Hayat M, Iqbal M, Tahir M. iRNA-PseTNC: identification of RNA 5-methylcytosine sites using hybrid vector space of pseudo nucleotide composition. Frontiers of Computer Science, 2020, 14(2): 451–460
CrossRef Google scholar
[50]
Ali F, Hayat M. Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition. Journal of Theoretical Biology, 2015, 384: 78–83
CrossRef Google scholar
[51]
LiW A, Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006, 22(13): 1658–1659
CrossRef Google scholar
[52]
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012, 28(23): 3150–3152
CrossRef Google scholar
[53]
Liu B, Liu Y, Huang D. Recombination hotspot/coldspot identification combining three different pseudocomponents via an ensemble learning approach. BioMed Research International, 2016, 10(1): 100–120
CrossRef Google scholar
[54]
Khan Z U, Ali F, Ahmad I, Hayat M, Pi D. iPred CNC: computational prediction model for cancerlectins and non-cancerlectins using novel cascade features subset selection. Chemometrics and Intelligent Laboratory Systems, 2019, 195: 103876
CrossRef Google scholar
[55]
Chen Z, Zhao P, Li F, Marquez-Lago T T, Leier A, Revote J, Zhu Y, Powell D R, Akutsu T, Webb G I, Chou K C. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Briefings in Bioinformatics, 2020, 21(3): 1047–1057
CrossRef Google scholar
[56]
Chen Z, Zhao P, Li F, Leier A, Marquez-Lago T T, Wang Y, Webb G I, Smith A I, Daly R J, Chou K C. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018, 34(14): 2499–2502
CrossRef Google scholar
[57]
Zhang S, Zhuang W, Xu Z. Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components. Analytical Biochemistry, 2018, 549: 149–156
CrossRef Google scholar
[58]
Chen W, Ding H, Zhou X, Lin H, Chou K C. iRNA(m6A)-PseDNC: identifying N6-methyladenosine sites using pseudo dinucleotide composition. Analytical Biochemistry, 2018, 561: 59–65
CrossRef Google scholar
[59]
Chen W, Feng P M, Lin H, Chou K C. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Research, 2013, 41(6): e68–e74
CrossRef Google scholar
[60]
Khan Z U, Ali F, Khan I A, Hussain Y, Pi D. iRSpot-SPI: deep learningbased recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou’s 5-step rule and pseudo components. Chemometrics and Intelligent Laboratory Systems, 2019, 189: 169–180
CrossRef Google scholar
[61]
Lin H, Deng E Z, Ding H, Chen W, Chou K C. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Research, 2014, 42(21): 12961–12972
CrossRef Google scholar
[62]
Feng P, Yang H, Ding H, Lin H, Chen W, Chou K C. iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2019, 111(1): 96–102
CrossRef Google scholar
[63]
Yang H, Qiu W R, Liu G, Guo F B, Chen W, Chou K C, Lin H. iRSpot- Pse6NC: identifying recombination spots in saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. International Journal of Biological Sciences, 2018, 14(8): 883
CrossRef Google scholar
[64]
Khan Z U, Hayat M, Khan M A. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. Journal of Theoretical Biology, 2015, 365: 197–203
CrossRef Google scholar
[65]
Ali F, Kabir M, Arif M, Khan Swati Z N, Khan Z U, Ullah M, Yu D J. DBPPred-PDSD: machine learning approach for prediction of DNAbinding proteins using Discrete Wavelet Transform and optimized integrated features space. Chemometrics and Intelligent Laboratory Systems, 2018, 182: 21–30
CrossRef Google scholar
[66]
Hayat M, Khan A. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. Journal of Theoretical Biology, 2011, 271(1): 10–17
CrossRef Google scholar
[67]
Chou K C, Shen H B. Recent progress in protein subcellular location prediction. Analytical Biochemistry, 2007, 370(1): 1–16
CrossRef Google scholar
[68]
Gheyas I A, Smith L S. Feature subset selection in large dimensionality domains. Pattern Recognition, 2010, 43(1): 5–13
CrossRef Google scholar
[69]
Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1–2): 273–324
CrossRef Google scholar
[70]
Chokka A, Sandhua Rani K. AdaBoost with feature selection using IoT to bring the paths for somatic mutations evaluation in cancer. In: Internet of Things and Personalized Healthcare Systems. Springer, Singapore, 2019, 51–63
CrossRef Google scholar
[71]
Maldonado S, Weber R. A wrapper method for feature selection using Support Vector Machines. Information Sciences, 2009, 179(13): 2208–2217
CrossRef Google scholar
[72]
Das S. Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 74–81
[73]
Hsu H H, Hsieh CW, Lu M D. Hybrid feature selection by combining filters and wrappers. Expert Systems with Applications, 2011, 38(7): 8144–8150
CrossRef Google scholar
[74]
Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering, 2014, 40(1): 16–28
CrossRef Google scholar
[75]
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8):1226–1238
CrossRef Google scholar
[76]
Yang R, Zhang C, Zhang L, Gao R. A two-step feature selection method to predict cancerlectins by multiview features and synthetic minority oversampling technique. BioMed Research International, 2018, 2018(1): 1–10
CrossRef Google scholar
[77]
AL-barakati H J, McConnell E W, Hicks L M, Poole L B, Newman R H. SVM-SulfoSite: a support vector machine based predictor for sulfenylation sites. Scientific Reports, 2018, 8(1): 11288
CrossRef Google scholar
[78]
Ding Y, Wilkins D. Improving the performance of SVM-RFE to select genes in microarray data. BMC Bioinformatics, 2006, 7(2): S12
CrossRef Google scholar
[79]
Javed F, Hayat M. Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou’s PseAAC. Genomics, 2019, 111(6): 1325–1332
CrossRef Google scholar
[80]
Liu B, Liu Y, Jin X, Wang X, Liu B. iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance. Scientific Reports, 2016, 6(1): 1–9
CrossRef Google scholar
[81]
Jia C, Zuo Y. S-SulfPred: a sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. Journal of Theoretical Biology, 2017, 422: 84–89
CrossRef Google scholar
[82]
Chou K C. Some remarks on predicting multi-label attributes in molecular biosystems. Molecular Biosystems, 2013, 9: 1092–1100
CrossRef Google scholar
[83]
Chou K C. Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 2011, 273(1): 236–247
CrossRef Google scholar
[84]
Liu B, Wang S, Long R, Chou K C. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics, 2017, 33(1):35–41
CrossRef Google scholar
[85]
Tahir M, Tayara H, Chong K T. iRNA-PseKNC (2methyl): identify RNA 2’-o-methylation sites by convolution neural network and chou’s pseudo components. Journal of Theoretical Biology, 2019, 465: 1–6
CrossRef Google scholar
[86]
Tayara H, Tahir M, Chong K T. Identification of prokaryotic promoters and their strength by integrating heterogeneous features. Genomics, 2020, 112(2): 13S96–1403
CrossRef Google scholar

RIGHTS & PERMISSIONS

2021 Higher Education Press
AI Summary AI Mindmap
PDF(722 KB)

Accesses

Citations

Detail

Sections
Recommended

/