Identification of cytokine via an improved genetic algorithm

Xiangxiang ZENG, Sisi YUAN, Xianxian HUANG, Quan ZOU

PDF(476 KB)
PDF(476 KB)
Front. Comput. Sci. ›› 2015, Vol. 9 ›› Issue (4) : 643-651. DOI: 10.1007/s11704-014-4089-3
RESEARCH ARTICLE

Identification of cytokine via an improved genetic algorithm

Author information +
History +

Abstract

With the explosive growth in the number of protein sequences generated in the postgenomic age, research into identifying cytokines from proteins and detecting their biochemical mechanisms becomes increasingly important. Unfortunately, the identification of cytokines from proteins is challenging due to a lack of understanding of the structure space provided by the proteins and the fact that only a small number of cytokines exists in massive proteins. In view of fact that a proteins sequence is conceptually similar to a mapping of words to meaning, n-gram, a type of probabilistic language model, is explored to extract features for proteins. The second challenge focused on in this work is genetic algorithms, a search heuristic that mimics the process of natural selection, that is utilized to develop a classifier for overcoming the protein imbalance problem to generate precise prediction of cytokines in proteins. Experiments carried on imbalanced proteins data set show that our methods outperform traditional algorithms in terms of the prediction ability.

Keywords

n-grams / genetic algorithm / cytokine identification / sampling / imbalanced data

Cite this article

Download citation ▾
Xiangxiang ZENG, Sisi YUAN, Xianxian HUANG, Quan ZOU. Identification of cytokine via an improved genetic algorithm. Front. Comput. Sci., 2015, 9(4): 643‒651 https://doi.org/10.1007/s11704-014-4089-3

References

[1]
Zou Q, Li X, Jiang Y, Zhao Y, Wang G. BinMemPredict: a Web server and software for predicting membrane protein types. Current Proteomics, 2013, 10(1): 2―9
CrossRef Google scholar
[2]
Yabuki Y, Muramatsu T, Hirokawa T, Mukai H, Suwa M. GRIFFIN: a system for predicting GPCR-G-protein coupling selectivity using a support vector machine and a hidden Markov model. Nucleic AcidsResearch, 2005, 33(suppl 2): W148―W153
CrossRef Google scholar
[3]
Nielsen H, Engelbrecht J, Brunak S, Heijne G V. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. International Journal of Neural Systems, 1997, 8(5-6): 581―599
CrossRef Google scholar
[4]
Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403―410
CrossRef Google scholar
[5]
Pearson W R. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics, 1991, 11(3): 635―650
CrossRef Google scholar
[6]
Huang N, Chen H, Sun Z. CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily. Protein Engineering Design and Selection, 2005, 18(8): 365―368
CrossRef Google scholar
[7]
Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC bioinformatics, 2009, 10(1): 381
CrossRef Google scholar
[8]
Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, Zou Q. Hierarchical classification of protein folds using a novel ensemble classifier. PloS one, 2013, 8(2): e56499
CrossRef Google scholar
[9]
Zou Q, Chen W, Huang Y, Liu X, Jiang Y. Identifying multi-functional enzyme by hierarchical multi-label classifier. Journal of Computational and Theoretical Nanoscience, 2013, 10(4): 1038―1043
CrossRef Google scholar
[10]
Chou K C, Shen H B. Recent advances in developing web-servers for predicting protein attributes. Natural Science, 2009, 1(2): 63―92
CrossRef Google scholar
[11]
Ganapathiraju M, Weisser D, Rosenfeld R, Carbonell J, Reddy R, Klein-Seetharaman J. Comparative n-gram analysis of whole-genome protein sequences. In: Proceedings of the 2nd International Conference on Human Language Technology Research. 2002, 76―81
CrossRef Google scholar
[12]
Srinivasan S M, Vural S, King B R, Guda C. Mining for class-specific motifs in protein sequence classification. BMC Bioinformatics, 2013, 14(1): 96
CrossRef Google scholar
[13]
Koza J R. Genetic Programming. MIT press, 1992
[14]
Sun Y, Kamel M S, Wong A K, Wang Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 2007, 40(12): 3358―3378
CrossRef Google scholar
[15]
Lewis D, Gale W. Training text classifiers by uncertainty sampling. In: Proceedings of the 14th ACM SIGIR Conference on Research and Development in Information Retrieval. 1994.
[16]
Kubat M, Holte R C, Matwin S. Machine learning for the detection of oil spills in satellite radar images. Machine learning, 1998, 30(2-3): 195―215
CrossRef Google scholar
[17]
Fawcett T. An introduction to ROC analysis. Pattern recognition letters, 2006, 27(8): 861―874
CrossRef Google scholar
[18]
Provost F J, Fawcett T. Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. 1997, 97: 43―48
[19]
Bateman A, Coin L, Durbin R, Finn R D, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer E L L, Studholme D J, Yeats C, Eddy, S. R. The Pfam protein families database. Nucleic Acids Research, 2004, 32: D138―D141
CrossRef Google scholar

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(476 KB)

Accesses

Citations

Detail

Sections
Recommended

/