Selecting near-native protein structures from ab initio models using ensemble clustering

Li Li , Huanqian Yan , Yonggang Lu

Quant. Biol. ›› 2018, Vol. 6 ›› Issue (4) : 307 -312.

PDF (1344KB)
Quant. Biol. ›› 2018, Vol. 6 ›› Issue (4) : 307 -312. DOI: 10.1007/s40484-018-0158-1
RESEARCH ARTICLE
RESEARCH ARTICLE

Selecting near-native protein structures from ab initio models using ensemble clustering

Author information +
History +
PDF (1344KB)

Abstract

Background: Ab initio protein structure prediction is to predict the tertiary structure of a protein from its amino acid sequence alone. As an important topic in bioinformatics, considerable efforts have been made on designing the ab initio methods. Unfortunately, lacking of a perfect energy function, it is a difficult task to select a good near-native structure from the predicted decoy structures in the last step.

Methods: Here we propose an ensemble clustering method based on k-medoids to deal with this problem. The k-medoids method is run many times to generate clustering ensembles, and then a voting method is used to combine the clustering results. A confidence score is defined to select the final near-native model, considering both the cluster size and the cluster similarity.

Results: We have applied the method to 54 single-domain targets in CASP-11. For about 70.4% of these targets, the proposed method can select better near-native structures compared to the SPICKER method used by the I-TASSER server.

Conclusions: The experiments show that, the proposed method is effective in selecting the near-native structure from decoy sets for different targets in terms of the similarity between the selected structure and the native structure.

Graphical abstract

Keywords

near-native structure / protein structure prediction / ab initio / decoy / ensemble clustering / k-medoids

Cite this article

Download citation ▾
Li Li, Huanqian Yan, Yonggang Lu. Selecting near-native protein structures from ab initio models using ensemble clustering. Quant. Biol., 2018, 6(4): 307-312 DOI:10.1007/s40484-018-0158-1

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

UniProtKB/TrEMBL Protein Database Release Statistics.

[2]

Zhang, Y. and Skolnick, J. (2004) Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA, 101, 7594–7599

[3]

Huang, D. S., Zhao, X. M., Huang, G. B. and Cheung, Y. M. (2006) Classifying protein sequences using hydropathy blocks. Pattern Recognit., 39, 2293–2300

[4]

Xia, J. F., Zhao, X. M., Song, J. and Huang, D. S. (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics, 11, 174

[5]

Huang, D. S., Zhang, L., Han, K., Deng, S., Yang, K. and Zhang, H. (2014) Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci., 15, 553–560

[6]

Shortle, D., Simons, K. T. and Baker, D. (1998) Clustering of low-energy conformations near the native structures of small proteins. Proc. Natl. Acad. Sci. USA, 95, 11158–11162

[7]

Kaufman L. and Rousseeuw P. J. (1987) Clustering by means of medoids. In Statistical Data Analysis Based on The Ll-Norm and Related Methods, Dodge , Y. (ed.). Basel: Birkhäuser Basel

[8]

Deng, Z., Choi, K. S., Jiang, Y., Wang, J. and Wang, S. (2016) A survey on soft subspace clustering. Inf. Sci., 348, 84–106

[9]

Bradley, P., Malmström, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D. E., Meiler, J., Misura, K. M. and Baker, D. (2005) Free modeling with Rosetta in CASP6. Proteins, 61, 128–134

[10]

Jain, A. K. (2010) Data clustering: 50 years beyond K-means. Pattern Recognit. Lett., 31, 651–666

[11]

Snyder, D. A. and Montelione, G. T. (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins, 59, 673–686

[12]

Asur S., Ucar D., and Parthasarathy S. (2006) An ensemble approach for clustering protein-protein interaction networks. Bioinfomatics, 23, i29-i40.

[13]

Pirim H. and Seker S.E. (2012) Ensemble clustering for biological datasets. In Bioinformatics, Pérez-Sánchez, H., (Ed.). IntechOpen,

[14]

Zhang, Y. and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins, 57, 702–710

[15]

Moult, J., Pedersen, J. T., Judson, R. and Fidelis, K. (1995) A large-scale experiment to assess protein structure prediction methods. Proteins, 23, ii–v

[16]

Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J. and Zhang, Y. (2015) The I-TASSER Suite: protein structure and function prediction. Nat. Methods, 12, 7–8

[17]

Zhang, Y. (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9, 40

[18]

The 11th Critical Assessment of Techniques for Protein Structure Prediction.

[19]

Zhang, Y. and Skolnick, J. (2004) SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem., 25, 865–871

[20]

Vega-Pons, S. and Ruiz-Shulcloper, J. (2011) A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell., 25, 337–372

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (1344KB)

1430

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/