Selecting near-native protein structures from <i>ab initio</i> models using ensemble clustering

Li Li; Huanqian Yan; Yonggang Lu

doi:10.1007/s40484-018-0158-1

PDF(1344 KB)

Quant. Biol. ›› 2018, Vol. 6 ›› Issue (4) : 307-312. DOI: 10.1007/s40484-018-0158-1

RESEARCH ARTICLE

Selecting near-native protein structures from ab initio models using ensemble clustering

Author information +

History +

Abstract

Background: Ab initio protein structure prediction is to predict the tertiary structure of a protein from its amino acid sequence alone. As an important topic in bioinformatics, considerable efforts have been made on designing the ab initio methods. Unfortunately, lacking of a perfect energy function, it is a difficult task to select a good near-native structure from the predicted decoy structures in the last step.

Methods: Here we propose an ensemble clustering method based on k-medoids to deal with this problem. The k-medoids method is run many times to generate clustering ensembles, and then a voting method is used to combine the clustering results. A confidence score is defined to select the final near-native model, considering both the cluster size and the cluster similarity.

Results: We have applied the method to 54 single-domain targets in CASP-11. For about 70.4% of these targets, the proposed method can select better near-native structures compared to the SPICKER method used by the I-TASSER server.

Conclusions: The experiments show that, the proposed method is effective in selecting the near-native structure from decoy sets for different targets in terms of the similarity between the selected structure and the native structure.

Author summary

It is a difficult task to select a good near-native structure from the predicted decoy structures produced by ab initio structure prediction methods. The k-medoids is usually used for the purpose due to its simplicity and efficiency. However, the result of the k-medoids method may be affected by its initial centroid selection. The paper proposes a new ensemble clustering method based on k-medoids to deal with this problem. The experiments show that the proposed method is effective in selecting the near-native structure from decoy sets for different targets.

Graphical abstract

Keywords

near-native structure / protein structure prediction / ab initio / decoy / ensemble clustering / k-medoids

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Li Li, Huanqian Yan, Yonggang Lu. Selecting near-native protein structures from ab initio models using ensemble clustering. Quant. Biol., 2018, 6(4): 307‒312 https://doi.org/10.1007/s40484-018-0158-1

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	UniProtKB/TrEMBL Protein Database Release Statistics.

[2]	Zhang, Y. and Skolnick, J. (2004) Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA, 101, 7594–7599 CrossRef Pubmed Google scholar

[3]	Huang, D. S., Zhao, X. M., Huang, G. B. and Cheung, Y. M. (2006) Classifying protein sequences using hydropathy blocks. Pattern Recognit., 39, 2293–2300 CrossRef Google scholar

[4]	Xia, J. F., Zhao, X. M., Song, J. and Huang, D. S. (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics, 11, 174 CrossRef Pubmed Google scholar

[5]	Huang, D. S., Zhang, L., Han, K., Deng, S., Yang, K. and Zhang, H. (2014) Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci., 15, 553–560 CrossRef Pubmed Google scholar

[6]	Shortle, D., Simons, K. T. and Baker, D. (1998) Clustering of low-energy conformations near the native structures of small proteins. Proc. Natl. Acad. Sci. USA, 95, 11158–11162 CrossRef Pubmed Google scholar

[7]	Kaufman L. and Rousseeuw P. J. (1987) Clustering by means of medoids. In Statistical Data Analysis Based on The Ll-Norm and Related Methods, Dodge , Y. (ed.). Basel: Birkhäuser Basel

[8]	Deng, Z., Choi, K. S., Jiang, Y., Wang, J. and Wang, S. (2016) A survey on soft subspace clustering. Inf. Sci., 348, 84–106 CrossRef Google scholar

[9]	Bradley, P., Malmström, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D. E., Meiler, J., Misura, K. M. and Baker, D. (2005) Free modeling with Rosetta in CASP6. Proteins, 61, 128–134 CrossRef Pubmed Google scholar

[10]	Jain, A. K. (2010) Data clustering: 50 years beyond K-means. Pattern Recognit. Lett., 31, 651–666 CrossRef Google scholar

[11]	Snyder, D. A. and Montelione, G. T. (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins, 59, 673–686 CrossRef Pubmed Google scholar

[12]	Asur S., Ucar D., and Parthasarathy S. (2006) An ensemble approach for clustering protein-protein interaction networks. Bioinfomatics, 23, i29-i40. CrossRef Google scholar

[13]	Pirim H. and Seker S.E. (2012) Ensemble clustering for biological datasets. In Bioinformatics, Pérez-Sánchez, H., (Ed.). IntechOpen, CrossRef Google scholar

[14]	Zhang, Y. and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins, 57, 702–710 CrossRef Pubmed Google scholar

[15]	Moult, J., Pedersen, J. T., Judson, R. and Fidelis, K. (1995) A large-scale experiment to assess protein structure prediction methods. Proteins, 23, ii–v CrossRef Pubmed Google scholar

[16]	Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J. and Zhang, Y. (2015) The I-TASSER Suite: protein structure and function prediction. Nat. Methods, 12, 7–8 CrossRef Pubmed Google scholar

[17]	Zhang, Y. (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9, 40 CrossRef Pubmed Google scholar

[18]	The 11th Critical Assessment of Techniques for Protein Structure Prediction.

[19]	Zhang, Y. and Skolnick, J. (2004) SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem., 25, 865–871 CrossRef Pubmed Google scholar

[20]	Vega-Pons, S. and Ruiz-Shulcloper, J. (2011) A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell., 25, 337–372 CrossRef Google scholar

ACKNOWLEDGEMENTS

This work is supported by the National Key R&D Program of China (Grants No. 2017YFE0111900), and the Lanzhou Talents Program for Innovation and Entrepreneurship (No. 2016-RC-93).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Li Li, Huanqian Yan and Yonggang Lu declare that they have no conflict of interests.

This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(1344 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Revised	Accepted	Published
09 Mar 2018	23 Apr 2018	05 May 2018	10 Dec 2018
Online First Date	Issue Date
30 Nov 2018	10 Dec 2018

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs