Construction of precise support vector machine based models for predicting promoter strength

Hailin Meng; Yingfei Ma; Guoqin Mai; Yong Wang; Chenli Liu

doi:10.1007/s40484-017-0096-3

PDF(3610 KB)

Quant. Biol. ›› 2017, Vol. 5 ›› Issue (1) : 90-98. DOI: 10.1007/s40484-017-0096-3

RESEARCH ARTICLE

Construction of precise support vector machine based models for predicting promoter strength

Author information +

History +

Abstract

Background: The prediction of the prokaryotic promoter strength based on its sequence is of great importance not only in the fundamental research of life sciences but also in the applied aspect of synthetic biology. Much advance has been made to build quantitative models for strength prediction, especially the introduction of machine learning methods such as artificial neural network (ANN) has significantly improve the prediction accuracy. As one of the most important machine learning methods, support vector machine (SVM) is more powerful to learn knowledge from small sample dataset and thus supposed to work in this problem.

Methods: To confirm this, we constructed SVM based models to quantitatively predict the promoter strength. A library of 100 promoter sequences and strength values was randomly divided into two datasets, including a training set (≥10 sequences) for model training and a test set (≥10 sequences) for model test.

Results: The results indicate that the prediction performance increases with an increase of the size of training set, and the best performance was achieved at the size of 90 sequences. After optimization of the model parameters, a high-performance model was finally trained, with a high squared correlation coefficient for fitting the training set (R²>0.99) and the test set (R²>0.98), both of which are better than that of ANN obtained by our previous work.

Conclusions: Our results demonstrate the SVM-based models can be employed for the quantitative prediction of promoter strength.

Graphical abstract

Keywords

support vector machine model / quantitative prediction / promoter strength / machine learning

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Hailin Meng, Yingfei Ma, Guoqin Mai, Yong Wang, Chenli Liu. Construction of precise support vector machine based models for predicting promoter strength. Quant. Biol., 2017, 5(1): 90‒98 https://doi.org/10.1007/s40484-017-0096-3

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Blount, B. A., Weenink, T., Vasylechko, S. and Ellis, T. (2012) Rational diversification of a promoter providing fine-tuned expression and orthogonal regulation for synthetic biology. PLoS One, 7, e33279 CrossRef Pubmed Google scholar

[2]	Qin, X., Qian, J., Yao, G., Zhuang, Y., Zhang, S. and Chu, J. (2011) GAP promoter library for fine-tuning of gene expression in Pichia pastoris. Appl. Environ. Microbiol., 77, 3600–3608 CrossRef Pubmed Google scholar

[3]	Alper, H., Fischer, C., Nevoigt, E. and Stephanopoulos, G. (2005) Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. USA, 102, 12678–12683. CrossRef Pubmed Google scholar

[4]	Salis, H. M., Mirsky, E. A. and Voigt, C. A. (2009) Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol., 27, 946–950. CrossRef Pubmed Google scholar

[5]	Lou, C., Stanton, B., Chen, Y. J., Munsky, B. and Voigt, C. A. (2012) Ribozyme-based insulator parts buffer synthetic circuits from genetic context. Nat. Biotechnol., 30, 1137–1142. CrossRef Pubmed Google scholar

[6]	Rhodius, V. A. and Mutalik, V. K. (2010) Predicting strength and function for promoters of the Escherichia coli alternative sigma factor, σ^E. Proc. Natl. Acad. Sci. USA, 107, 2854–2859 CrossRef Pubmed Google scholar

[7]	De Mey, M., Maertens, J., Lequeux, G. J., Soetaert, W. K. and Vandamme, E. J. (2007) Construction and model-based analysis of a promoter library for E. coli: an indispensable tool for metabolic engineering. BMC Biotechnol., 7, 34 CrossRef Pubmed Google scholar

[8]	Meng, H., Wang, J., Xiong, Z., Xu, F., Zhao, G. and Wang, Y. (2013) Quantitative design of regulatory elements based on high-precision strength prediction using artificial neural network. PLoS One, 8, e60288 CrossRef Pubmed Google scholar

[9]	Meng, H. and Wang, Y. (2015) Cis-acting regulatory elements: from random screening to quantitative design. Quant. Biol., 3, 107–114. CrossRef Google scholar

[10]	Vapnik, V. N. (2000) The Nature of Statistical Learning Theory. New York: Springer-Verlag

[11]	Vapnik, V. N. (1999) An overview of statistical learning theory. IEEE Trans. Neural Netw., 10, 988–999. CrossRef Pubmed Google scholar

[12]	Hassanien, A. E., Al-Shammari, E. T. and Ghali, N. I. (2013) Computational intelligence techniques in bioinformatics. Comput. Biol. Chem., 47, 37–47. CrossRef Pubmed Google scholar

[13]	Ho, H. K., Zhang, L., Ramamohanarao, K. and Martin, S. (2013) A survey of machine learning methods for secondary and supersecondary protein structure prediction. In Methods and Protocols: Methods in Molecular Biology, 932, 87–106. New York: Humana Press CrossRef Pubmed Google scholar

[14]	Cheng, J., Tegge, A. N. and Baldi, P. (2008) Machine learning methods for protein structure prediction. IEEE Rev. Biomed. Eng., 1, 41–49. CrossRef Pubmed Google scholar

[15]	Zhao, Y. and Wang, Z. (2008) RNA secondary structure prediction based on support vector machine classification. Chinese Journal of Biotechnology, 24, 1140–1148. CrossRef Pubmed Google scholar

[16]	Towsey, M. W., Gordon, J. J. and Hogan, J. M. (2006) The prediction of bacterial transcription start sites using SVMs. Int. J. Neural Syst., 16, 363–370. CrossRef Pubmed Google scholar

[17]	Ichikawa, D., Saito, T., Ujita, W. and Oyama, H. (2016) How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach. J. Biomed. Inform., 64, 20–24. CrossRef Pubmed Google scholar

[18]	Vyas, R., Bapat, S., Jain, E., Tambe, S. S., Karthikeyan, M. and Kulkarni, B. D. (2015) A study of applications of machine learning based classification methods for virtual screening of lead molecules. Comb. Chem. High Throughput Screen., 18, 658–672. CrossRef Pubmed Google scholar

[19]	Burton, J., Ijjaali, I., Petitet, F., Michel, A. and Vercauteren, D. P. (2009) Virtual screening for cytochromes p450: successes of machine learning filters. Comb. Chem. High Throughput Screen., 12, 369–382. CrossRef Pubmed Google scholar

[20]	Melville, J. L., Burke, E. K. and Hirst, J. D. (2009) Machine learning in virtual screening. Comb. Chem. High Throughput Screen., 12, 332–343. CrossRef Pubmed Google scholar

[21]	Fox, T. and Kriegl, J. M. (2006) Machine learning techniques for in silico modeling of drug metabolism. Curr. Top. Med. Chem., 6, 1579–1591. CrossRef Pubmed Google scholar

[22]	Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. and Fotiadis, D. I. (2015) Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J., 13, 8–17. CrossRef Pubmed Google scholar

[23]	Polley, M. Y., Freidlin, B., Korn, E. L., Conley, B. A., Abrams, J. S. and McShane, L. M. (2013) Statistical and practical considerations for clinical evaluation of predictive biomarkers. J. Natl. Cancer Inst., 105, 1677–1683. CrossRef Pubmed Google scholar

[24]	Liang, G. and Li, Z. (2007) Scores of generalized base properties for quantitative sequence-activity modelings for E. coli promoters based on support vector machine. J. Mol. Graph. Model., 26, 269–281. CrossRef Pubmed Google scholar

[25]	Towsey, M., Timms, P., Hogan, J. and Mathews, S. A. (2008) The cross-species prediction of bacterial promoters using a support vector machine. Comput. Biol. Chem., 32, 359–366. CrossRef Pubmed Google scholar

[26]	Xu, W., Zhang, L. and Lu, Y. (2016) SD-MSAEs: promoter recognition in human genome based on deep feature extraction. J. Biomed. Inform., 61, 55–62. CrossRef Pubmed Google scholar

[27]	Sato, M. (2012) Promoter analysis with wavelets and support vector machines. Procedia Comput. Sci., 12, 432–437. CrossRef Google scholar

[28]	Holloway, D. T., Kon, M. and Delisi, C. (2007) Machine learning for regulatory analysis and transcription factor target prediction in yeast. Syst. Synth. Biol., 1, 25–46. CrossRef Pubmed Google scholar

[29]	Anwar, F., Baker, S. M., Jabid, T., Mehedi Hasan, M., Shoyaib, M., Khan, H. and Walshe, R. (2008) Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach. BMC Bioinformatics, 9, 414 CrossRef Pubmed Google scholar

[30]	Carvalho, S. G., Guerra-Sá, R. and de C Merschmann, L. H. (2015) The impact of sequence length and number of sequences on promoter prediction performance. BMC Bioinformatics, 16, S5 CrossRef Pubmed Google scholar

[31]	Hwang, W., Oliver, V. F., Merbs, S. L., Zhu, H. and Qian, J. (2015) Prediction of promoters and enhancers using multiple DNA methylation-associated features. BMC Genomics, 16, S11 CrossRef Pubmed Google scholar

[32]

Li, Y., Lee, K. K., Walsh, S., Smith, C., Hadingham, S., Sorefan, K., Cawley, G. and Bevan, M. W. (2006) Establishing glucose- and ABA-regulated transcription networks in Arabidopsis by microarray analysis and promoter classification using a Relevance Vector Machine. Genome Res., 16, 414–427.

CrossRef Pubmed Google scholar

[33]	Sandhu, R. S., Coyne, E. J., Feinstein, H. L. and Youman, C. E. (1996) Role based access control models. IEEE Computer, 29, 38–47. CrossRef Google scholar

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at DOI 10.1007/s40484-017-0096-3.

ACKNOWLEDGEMENTS

This work was financially supported by NSFC (Nos. 31471270, 31301017, 31670056 and 31300686), 973 Program (No. 2014CB745202), 863 Program (No. SS2015AA020936), the Guangdong Natural Science Funds for Distinguished Young Scholar (No. S2013050016987), the Science and Technology Planning Project of Guangdong Province (Nos. 2014B020201001 and 2014A030304008), Natural Science Foundation of Guangdong Province (No. 2015A030310317), the Guangzhou Science and Technology Scheme (Nos. 201508020091 and 201508020092), and Shenzhen grants (Nos.KQTD2015033ll7210153, JCYJ20140610152828703, KQJSCX20160301144623, CXZZ20140901004122088, JCYJ20150521144321007 and JCYJ20140901003939019).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Hailin Meng, Yingfei Ma, Guoqin Mai, Yong Wang and Chenli Liu declare they have no conflict of interests.

This article does not contain any studies with human or animal subjects performed by any of the authors.

Funding

RIGHTS & PERMISSIONS

2017 Higher Education Press and Springer-Verlag Berlin Heidelberg

AI Summary AI Mindmap

PDF(3610 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Accepted	Published
10 Oct 2016	15 Dec 2016	22 Mar 2017
Online First Date	Issue Date
15 Feb 2017	22 Mar 2017

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs