SySAP: a system-level predictor of deleterious single amino acid polymorphisms

Tao Huang1,2, Chuan Wang1, Guoqing Zhang1, Lu Xie2(), Yixue Li1,2()

PDF(145 KB)
PDF(145 KB)
Protein Cell ›› 2012, Vol. 3 ›› Issue (1) : 38-43. DOI: 10.1007/s13238-011-1130-2
COMMUNICATION
COMMUNICATION

SySAP: a system-level predictor of deleterious single amino acid polymorphisms

  • Tao Huang1,2, Chuan Wang1, Guoqing Zhang1, Lu Xie2(), Yixue Li1,2()
Author information +
History +

Abstract

Single amino acid polymorphisms (SAPs), also known as non-synonymous single nucleotide polymorphisms (nsSNPs), are responsible for most of human genetic diseases. Discriminate the deleterious SAPs from neutral ones can help identify the disease genes and understand the mechanism of diseases. In this work, a method of deleterious SAP prediction at system level was established. Unlike most existing methods, our method not only considers the sequence and structure information, but also the network information. The integration of network information can improve the performance of deleterious SAP prediction. To make our method available to the public, we developed SySAP (a System-level predictor of deleterious Single Amino acid Polymorphisms), an easy-to-use and high accurate web server. SySAP is freely available at http://www.biosino.org/SySAP/and http://lifecenter.sgst.cn/SySAP/.

Keywords

deleterious single amino acid polymorphisms / predictor / web server

Cite this article

Download citation ▾
Tao Huang, Chuan Wang, Guoqing Zhang, Lu Xie, Yixue Li. SySAP: a system-level predictor of deleterious single amino acid polymorphisms. Prot Cell, 2012, 3(1): 38‒43 https://doi.org/10.1007/s13238-011-1130-2

References

[1] Ahmad, S., and Sarai, A. (2005). PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics 6, 3315720719.
[2] Altschul, S.F., Madden, T.L., Sch?ffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402 9254694.
[3] Atchley, W.R., Zhao, J., Fernandes, A.D., and Drüke, T. (2005). Solving the protein sequence metric problem. Proc Natl Acad Sci U S A 102, 6395-6400 15851683.
[4] Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A., and Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412-424 10871264.
[5] Burke, D.F., Worth, C.L., Priego, E.M., Cheng, T., Smink, L.J., Todd, J.A., and Blundell, T.L. (2007). Genome bioinformatic analysis of nonsynonymous SNPs. BMC Bioinformatics 8, 30117708757.
[6] Cai, Y., Huang, T., Hu, L., Shi, X., Xie, L., and Li, Y.Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids . 2011Jan26. [Epub ahead of print].
[7] Cai, Y.D., Huang, T., Feng, K.Y., Hu, L., and Xie, L. (2010). A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas. PLoS One 5, e1272620856936.
[8] Care, M.A., Needham, C.J., Bulpitt, A.J., and Westhead, D.R. (2007). Deleterious SNP prediction: be mindful of your training data! Bioinformatics 23, 664-672 17234639.
[9] Chou, K.C. (2001). Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43, 246-255 11288174.
[10] Chou, K.C. (2011). Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273, 236-247 21168420.
[11] Chou, K.C., and Shen, H.B. (2007). Recent progress in protein subcellular location prediction. Anal Biochem 370, 1-16 17698024.
[12] Chou, K.C., and Shen, H.B. (2008). Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3, 153-162 18274516.
[13] Chou, K.C., Wu, Z.C., and Xiao, X. (2011). iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One 6, e1825821483473.
[14] Chou, K.C., and Zhang, C.T. (1995). Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30, 275-349 7587280.
[15] Esmaeili, M., Mohabatkar, H., and Mohsenzadeh, S. (2010). Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 263, 203-209 19961864.
[16] Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. J Mach Learn Res 9, 1871-1874 .
[17] Freeman, L.C. (1979). Centrality in social networks: Conceptual clarification. Soc Networks 1, 215-239 .
[18] Georgiou, D.N., Karakasidis, T.E., Nieto, J.J., and Torres, A. (2009). Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition. J Theor Biol 257, 17-26 19056401.
[19] Grantham, R. (1974). Amino acid difference formula to help explain protein evolution. Science 185, 862-864 4843792.
[20] Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., and McKusick, V.A. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33, D514-D517 15608251.
[21] Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S.S., and Sundararajan, S. (2008). A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th international conference on Machine learning . Helsinki, Finland: ACM, 408-415 .
[22] Hu, L., Huang, T., Shi, X., Lu, W.C., Cai, Y.D., and Chou, K.C. (2011a). Predicting functions of proteins in mouse based on weighted protein-protein interaction network and protein hybrid properties. PLoS One 6, e1455621283518.
[23] Hu, L.L., Huang, T., Cai, Y.D., and Chou, K.C. (2011b). Prediction of body fluids where proteins are secreted into based on protein interaction network. PLoS One 6, e2298921829572.
[24] Huang, T., Chen, L., Cai, Y.D., and Chou, K.C. (2011a). Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS One 6, e2529721980418.
[25] Huang, T., Cui, W., Hu, L., Feng, K., Li, Y.X., and Cai, Y.D. (2009). Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. PLoS One 4, e812619956587.
[26] Huang, T., Niu, S., Xu, Z., Huang, Y., Kong, X., Cai, Y.D., and Chou, K.C. (2011b). Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties. PLoS One 6, e2294021857971.
[27] Huang, T., Shi, X.H., Wang, P., He, Z., Feng, K.Y., Hu, L., Kong, X., Li, Y.X., Cai, Y.D., and Chou, K.C. (2010a). Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One 5, e1097220532046.
[28] Huang, T., Tu, K., Shyr, Y., Wei, C.C., Xie, L., and Li, Y.X. (2008). The prediction of interferon treatment effects based on time series microarray gene expression profiles. J Transl Med 6, 4418691426.
[29] Huang, T., Wang, P., Ye, Z.Q., Xu, H., He, Z., Feng, K.Y., Hu, L., Cui, W., Wang, K., Dong, X., (2010b). Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties. PLoS One 5, e1190020689580.
[30] Jensen, L.J., Kuhn, M., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A., Simonovic, M., (2009). STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37, D412-D416 18940858.
[31] Kawashima, S., Ogata, H., and Kanehisa, M. (1999). AAindex: amino acid index database. Nucleic Acids Res 27, 368-369 9847231.
[32] Keerthi, S.S., Sundararajan, S., Chang, K.-W., Hsieh, C.-J., and Lin, C.-J. (2008). A sequential dual method for large scale multi-class linear svms. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. Las Vegas, Nevada , USA: ACM, 408-416 .
[33] Li, S., Xi, L., Li, J., Wang, C., Lei, B., Shen, Y., Liu, H., Yao, X., and Li, B. (2011). In silico prediction of deleterious single amino acid polymorphisms from amino acid sequence. J Comput Chem 32, 1211-1216 .
[34] Lin, C.-J., Weng, R.C., and Keerthi, S.S. (2008). Trust region newton method for logistic regression. J Mach Learn Res 9, 627-650 .
[35] Lin, W.Z., Fang, J.A., Xiao, X., and Chou, K.C. (2011). iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 6, e2475621935457.
[36] Mohabatkar, H. (2010). Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Pept Lett 17, 1207-1214 20450487.
[37] Ng, P.C., and Henikoff, S. (2002). Accounting for human polymorphisms predicted to affect protein function. Genome Res 12, 436-446 11875032.
[38] Ng, P.C., and Henikoff, S. (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 31, 3812-3814 12824425.
[39] Niu, S., Huang, T., Feng, K., Cai, Y., and Li, Y. (2010). Prediction of tyrosine sulfation with mRMR feature selection and analysis. J Proteome Res 9, 6490-6497 20973568.
[40] Peng, K., Radivojac, P., Vucetic, S., Dunker, A.K., and Obradovic, Z. (2006). Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics 7, 20816618368.
[41] Qiu, J.D., Huang, J.H., Shi, S.P., and Liang, R.P. (2010). Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. Protein Pept Lett 17, 715-722 19961429.
[42] Ramensky, V., Bork, P., and Sunyaev, S. (2002). Human non-synonymous SNPs: server and survey. Nucleic Acids Res 30, 3894-3900 12202775.
[43] Sharan, R., Ulitsky, I., and Shamir, R. (2007). Network-based prediction of protein function. Mol Syst Biol 3, 8817353930.
[44] Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., and Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29, 308-311 11125122.
[45] Stenson, P.D., Ball, E.V., Mort, M., Phillips, A.D., Shiel, J.A., Thomas, N.S., Abeysinghe, S., Krawczak, M., and Cooper, D.N. (2003). Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat 21, 577-581 12754702.
[46] Wang, P., Xiao, X., and Chou, K.C. (2011). NR-2L: a two-level predictor for identifying nuclear receptor subfamilies based on sequence-derived features. PLoS One 6, e2350521858146.
[47] Wu, Z.C., Xiao, X., and Chou, K.C. (2011). iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. Mol Biosyst 7, 3287-3297 21984117.
[48] Xiao, X., Wu, Z.C., and Chou, K.C. (2011). A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One 6, e2059221698097.
[49] Ye, Z.Q., Zhao, S.Q., Gao, G., Liu, X.Q., Langlois, R.E., Lu, H., and Wei, L. (2007). Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics 23, 1444-1450 17384424.
[50] Zeng, Y.H., Guo, Y.Z., Xiao, R.Q., Yang, L., Yu, L.Z., and Li, M.L. (2009). Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259, 366-372 19341746
AI Summary AI Mindmap
PDF(145 KB)

Accesses

Citations

Detail

Sections
Recommended

/