PCA for predicting quaternary structure of protein

WANG Tong1, SHEN Hongbin1, YAO Lixiu1, YANG Jie1, CHOU Kuochen2

PDF(123 KB)
PDF(123 KB)
Front. Electr. Electron. Eng. ›› 2008, Vol. 3 ›› Issue (4) : 376-380. DOI: 10.1007/s11460-008-0084-5

PCA for predicting quaternary structure of protein

  • WANG Tong1, SHEN Hongbin1, YAO Lixiu1, YANG Jie1, CHOU Kuochen2
Author information +
History +

Abstract

The number and arrangement of subunits that form a protein are referred to as quaternary structure. Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system. With the explosion of protein sequences generated in the Post-Genomic Age, it is vital to develop an automated method to deal with such a challenge. To explore this problem, we adopted an approach based on the pseudo position-specific score matrix (Pse-PSSM) descriptor, proposed by Chou and Shen, representing a protein sample. The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated information. However, incorporating all these effects into a descriptor may cause ‘high dimension disaster’. To overcome such a problem, the fusion approach was adopted by Chou and Shen. A completely different approach, linear dimensionality reduction algorithm principal component analysis (PCA) is introduced to extract key features from the high-dimensional Pse-PSSM space. The obtained dimension-reduced descriptor vector is a compact representation of the original high dimensional vector. The jackknife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems, such as predicting the quaternary structure of proteins.

Cite this article

Download citation ▾
WANG Tong, SHEN Hongbin, YAO Lixiu, YANG Jie, CHOU Kuochen. PCA for predicting quaternary structure of protein. Front. Electr. Electron. Eng., 2008, 3(4): 376‒380 https://doi.org/10.1007/s11460-008-0084-5

References

1. Klotz I M, Langerman N R, Darnall D W . Quaternary structure of proteins. Annual Review of Biochemistry, 1970, 39: 25–62. doi:10.1146/annurev.bi.39.070170.000325
2. Price N C . Assembly of multi-subunit structures. In: Pain R H, ed. Mechanisms of ProteinFolding. New York: Oxford University Press, 1994, 160–193
3. Chou K C, Cai Y D . Predicting protein quaternarystructure by pseudo-amino acid composition. Proteins, 2003, 53(2): 282–289. doi:10.1002/prot.10500
4. Anfinsen C B . Principles that govern the folding of protein chains. Science, 1973, 181(96): 223–230. doi:10.1126/science.181.4096.223
5. Anfinsen C B, Haber E, Sela M, et al.. The kinetics of formation of native ribonucleaseduring oxidation of the reduced polypeptide chain. Proceedings of the National Academy of Sciences of the United Statesof America, 1961, 47: 1309–1314. doi:10.1073/pnas.47.9.1309
6. Garian R . Predictionof quaternary structure from primary structure. Bioinformatics, 2001, 17(6): 551–556. doi:10.1093/bioinformatics/17.6.551
7. Chou K C . Prediction of protein cellular attributes using pseudo-amino acidcomposition. Proteins, 2001, 43(3): 246–255. doi:10.1002/prot.1035
8. Shen H B, Chou K C . PseAAC: a flexible web serverfor generating various kinds of protein pseudo amino acid composition. Analytical Biochemistry, 2008, 373(2): 386–388. doi:10.1016/j.ab.2007.10.012
9. Chou K C, Shen H B . MemType-2L: a web serverfor predicting membrane proteins and their types by incorporatingevolution information through Pse-PSSM. Biochemical and Biophysical Research Communications, 2007, 360(2): 339–345. doi:10.1016/j.bbrc.2007.06.027
10. Wang G, Dunbrack R L Jr . PISCES: a protein sequenceculling server. Bioinformatics, 2003, 19(12): 1589–1591. doi:10.1093/bioinformatics/btg224
11. Schäffer A A, Aravind L, Madden T L, et al.. Improving the accuracy of PSI-BLAST proteindatabase searches with composition-based statistics and other refinements. Nucleic Acids Research, 2001, 29(14): 2994–3005. doi:10.1093/nar/29.14.2994
12. Chou K C . A key driving force in determination of protein structural classes. Biochemical and Biophysical Research Communications, 1999, 264(1): 216–224. doi:10.1006/bbrc.1999.1325
13. Malinowski E R, Howery D G . Factor Analysis in Chemistry. New York: John Wiley, 1980
14. Deming S N . Chemometrics: an overview. Clinical Chemistry, 1986, 32(9): 1702–1706
15. Du Q S, Jiang Z Q, He W Z, et al.. Amino acid principal component analysis (AAPCA)and its applications in protein structural class prediction. Journal of Biomolecular Structure and Dynamics, 2006, 23(6): 635–640
16. Wen Y, Lu Y, Shi P F . Handwritten Bangla numeral recognition system and itsapplication to postal automation. PatternRecognition, 2007, 40(1): 99–107. doi:10.1016/j.patcog.2006.07.001
17. Liang Z Z, Zhang D, Shi P F . The theoretical analysis of GLRAM and its applications. Pattern Recognition, 2007, 40(3): 1032–1041. doi:10.1016/j.patcog.2006.04.038
18. Chou K C, Shen H B . Recent progress in proteinsubcellular location prediction. AnalyticalBiochemistry, 2007, 370(1): 1–16. doi:10.1016/j.ab.2007.07.006
19. Du P, He T, Li Y . Prediction of C-to-U RNA editing sites in higher plantmitochondria using only nucleotide sequence features. Biochemical and Biophysical Research Communications, 2007, 358(1): 336–341. doi:10.1016/j.bbrc.2007.04.130
20. Du P, Li Y . Prediction of protein submitochondrialocations by hybridizing pseudo-amino acid composition with variousphysicochemical features of segmented sequence. BMC Bioinformatics, 2006, 7: 518. doi: 10.1186/1471-2105-7-518
21. Huang Y, Cai J, Ji L, et al.. Classifying G-protein coupled receptors withbagging classification tree. ComputationBiology and Chemistry, 2004, 28(4): 275–280. doi:10.1016/j.compbiolchem.2004.08.001
22. Wang M, Yang J, Liu G P, et al.. Weighted-support vector machines for predictingmembrane protein types based on pseudo-amino acid composition. Protein Engineering Design and Selection, 2004, 17(6): 509–516. doi:10.1093/protein/gzh061
23. Shen H B, Chou K C . Ensemble classifier for proteinfold pattern recognition. Bioinformatics, 2006, 22(14): 1717–1722. doi:10.1093/bioinformatics/btl170
24. Wang S Q, Yang J, Chou K C . Using stacked generalization to predict membrane proteintypes based on pseudo-amino acid composition. Journal of Theoretical Biology, 2006, 242(4): 941–946. doi:10.1016/j.jtbi.2006.05.006
25. Chou K C, Shen H B . Predicting eukaryotic proteinsubcellular location by fusing optimized evidence-theoretic K-nearestneighbor classifiers. Journal of ProteomeResearch, 2006, 5(8): 1888–1897. doi:10.1021/pr060167c
26. Denoeux T . Ak-nearest neighbor classification rule based on Dempster-Shafer theory. IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25(5): 804–813. doi:10.1109/21.376493
27. Keller J M, Gray M R, Givens J A Jr . A fuzzy K-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics, 1985, 15(4): 580–585
PDF(123 KB)

Accesses

Citations

Detail

Sections
Recommended

/