Selecting near-native protein structures from ab initio models using ensemble clustering

Li Li, Huanqian Yan, Yonggang Lu

PDF(1344 KB)
PDF(1344 KB)
Quant. Biol. ›› 2018, Vol. 6 ›› Issue (4) : 307-312. DOI: 10.1007/s40484-018-0158-1
RESEARCH ARTICLE
RESEARCH ARTICLE

Selecting near-native protein structures from ab initio models using ensemble clustering

Author information +
History +

Abstract

Background: Ab initio protein structure prediction is to predict the tertiary structure of a protein from its amino acid sequence alone. As an important topic in bioinformatics, considerable efforts have been made on designing the ab initio methods. Unfortunately, lacking of a perfect energy function, it is a difficult task to select a good near-native structure from the predicted decoy structures in the last step.

Methods: Here we propose an ensemble clustering method based on k-medoids to deal with this problem. The k-medoids method is run many times to generate clustering ensembles, and then a voting method is used to combine the clustering results. A confidence score is defined to select the final near-native model, considering both the cluster size and the cluster similarity.

Results: We have applied the method to 54 single-domain targets in CASP-11. For about 70.4% of these targets, the proposed method can select better near-native structures compared to the SPICKER method used by the I-TASSER server.

Conclusions: The experiments show that, the proposed method is effective in selecting the near-native structure from decoy sets for different targets in terms of the similarity between the selected structure and the native structure.

Author summary

It is a difficult task to select a good near-native structure from the predicted decoy structures produced by ab initio structure prediction methods. The k-medoids is usually used for the purpose due to its simplicity and efficiency. However, the result of the k-medoids method may be affected by its initial centroid selection. The paper proposes a new ensemble clustering method based on k-medoids to deal with this problem. The experiments show that the proposed method is effective in selecting the near-native structure from decoy sets for different targets.

Graphical abstract

Keywords

near-native structure / protein structure prediction / ab initio / decoy / ensemble clustering / k-medoids

Cite this article

Download citation ▾
Li Li, Huanqian Yan, Yonggang Lu. Selecting near-native protein structures from ab initio models using ensemble clustering. Quant. Biol., 2018, 6(4): 307‒312 https://doi.org/10.1007/s40484-018-0158-1

References

[1]
UniProtKB/TrEMBL Protein Database Release Statistics.
[2]
Zhang, Y. and Skolnick, J. (2004) Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA, 101, 7594–7599
CrossRef Pubmed Google scholar
[3]
Huang, D. S., Zhao, X. M., Huang, G. B. and Cheung, Y. M. (2006) Classifying protein sequences using hydropathy blocks. Pattern Recognit., 39, 2293–2300
CrossRef Google scholar
[4]
Xia, J. F., Zhao, X. M., Song, J. and Huang, D. S. (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics, 11, 174
CrossRef Pubmed Google scholar
[5]
Huang, D. S., Zhang, L., Han, K., Deng, S., Yang, K. and Zhang, H. (2014) Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci., 15, 553–560
CrossRef Pubmed Google scholar
[6]
Shortle, D., Simons, K. T. and Baker, D. (1998) Clustering of low-energy conformations near the native structures of small proteins. Proc. Natl. Acad. Sci. USA, 95, 11158–11162
CrossRef Pubmed Google scholar
[7]
Kaufman L. and Rousseeuw P. J. (1987) Clustering by means of medoids. In Statistical Data Analysis Based on The Ll-Norm and Related Methods, Dodge , Y. (ed.). Basel: Birkhäuser Basel
[8]
Deng, Z., Choi, K. S., Jiang, Y., Wang, J. and Wang, S. (2016) A survey on soft subspace clustering. Inf. Sci., 348, 84–106
CrossRef Google scholar
[9]
Bradley, P., Malmström, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D. E., Meiler, J., Misura, K. M. and Baker, D. (2005) Free modeling with Rosetta in CASP6. Proteins, 61, 128–134
CrossRef Pubmed Google scholar
[10]
Jain, A. K. (2010) Data clustering: 50 years beyond K-means. Pattern Recognit. Lett., 31, 651–666
CrossRef Google scholar
[11]
Snyder, D. A. and Montelione, G. T. (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins, 59, 673–686
CrossRef Pubmed Google scholar
[12]
Asur S., Ucar D., and Parthasarathy S. (2006) An ensemble approach for clustering protein-protein interaction networks. Bioinfomatics, 23, i29-i40.
CrossRef Google scholar
[13]
Pirim H. and Seker S.E. (2012) Ensemble clustering for biological datasets. In Bioinformatics, Pérez-Sánchez, H., (Ed.). IntechOpen,
CrossRef Google scholar
[14]
Zhang, Y. and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins, 57, 702–710
CrossRef Pubmed Google scholar
[15]
Moult, J., Pedersen, J. T., Judson, R. and Fidelis, K. (1995) A large-scale experiment to assess protein structure prediction methods. Proteins, 23, ii–v
CrossRef Pubmed Google scholar
[16]
Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J. and Zhang, Y. (2015) The I-TASSER Suite: protein structure and function prediction. Nat. Methods, 12, 7–8
CrossRef Pubmed Google scholar
[17]
Zhang, Y. (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9, 40
CrossRef Pubmed Google scholar
[18]
The 11th Critical Assessment of Techniques for Protein Structure Prediction.
[19]
Zhang, Y. and Skolnick, J. (2004) SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem., 25, 865–871
CrossRef Pubmed Google scholar
[20]
Vega-Pons, S. and Ruiz-Shulcloper, J. (2011) A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell., 25, 337–372
CrossRef Google scholar

ACKNOWLEDGEMENTS

This work is supported by the National Key R&D Program of China (Grants No. 2017YFE0111900), and the Lanzhou Talents Program for Innovation and Entrepreneurship (No. 2016-RC-93).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Li Li, Huanqian Yan and Yonggang Lu declare that they have no conflict of interests.
This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1344 KB)

Accesses

Citations

Detail

Sections
Recommended

/