Pattern recognition methods in microarray based oncology study
Xuesong LU, Xuegong ZHANG
Pattern recognition methods in microarray based oncology study
With the development of microarray technology, more and more microarray-based oncology studies have been carried out. Huge amounts of data and the complexity of cancer mechanisms make data analysis methods a much more important part of these studies. In this article, we will mainly focus on the pattern recognition methods used in oncology studies. According to the availability of sample information, the unsupervised methods and supervised methods are reviewed separately. Finally, some possible future directions are proposed.
pattern recognition methods / microarray / oncology
[1] |
Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P, Coller H, Loh M L, Downing J R, Caligiuri M A, Bloomfield C D, Lander E S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439): 531–537
CrossRef
Google scholar
|
[2] |
Alon U, Barkai N, Notterman D A, Gish K, Ybarra S, Mack D, Levine A J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 1999, 96(12): 6745–6750
CrossRef
Google scholar
|
[3] |
van’t Veer L J, Dai H, van de Vijver M J, He Y D, Hart A A, Mao M, Peterse H L, van der Kooy K, Marton M J, Witteveen A T, Schreiber G J, Kerkhoven R M, Roberts C, Linsley P S, Bernards R, Friend S H. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 2002, 415(6871): 530–536
CrossRef
Google scholar
|
[4] |
Alizadeh A A, Eisen M B, Davis R E, Ma C, Lossos I S, Rosenwald A, Boldrick J C, Sabet H, Tran T, Yu X, Powell J I, Yang L, Marti G E, Moore T, Hudson J Jr, Lu L, Lewis D B, Tibshirani R, Sherlock G, Chan W C, Greiner T C, Weisenburger D D, Armitage J O, Warnke R, Levy R, Wilson W, Grever M R, Byrd J C, Botstein D, Brown P O, Staudt L M. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 2000, 403(6769): 503–511
CrossRef
Google scholar
|
[5] |
Beer D G, Kardia S L, Huang C C, Giordano T J, Levin A M, Misek D E, Lin L, Chen G, Gharib T G, Thomas D G, Lizyness M L, Kuick R, Hayasaka S, Taylor J M, Iannettoni M D, Orringer M B, Hanash S. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine, 2002, 8(8): 816–824
|
[6] |
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 2000, 406(6795): 536–540
CrossRef
Google scholar
|
[7] |
Dyrskjøt L, Thykjaer T, Kruhøffer M, Jensen J L, Marcussen N, Hamilton-Dutoit S, Wolf H, Orntoft T F. Identifying distinct classes of bladder carcinoma using microarrays. Nature Genetics, 2003, 33(1): 90–96
CrossRef
Google scholar
|
[8] |
Kapp A V, Jeffrey S S, Langerød A, B-rresen-Dale A L, Han W, Noh D Y, Bukholm I R, Nicolau M, Brown P O, Tibshirani R. Discovery and validation of breast cancer subtypes. BMC Genomics, 2006, 7: 231
CrossRef
Google scholar
|
[9] |
Ross D T, Scherf U, Eisen M B, Perou C M, Rees C, Spellman P, Iyer V, Jeffrey S S, van de Rijn M, Waltham M, Pergamenschikov A, Lee J C, Lashkari D, Shalon D, Myers T G, Weinstein J N, Botstein D, Brown P O. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 2000, 24(3): 227–235
CrossRef
Google scholar
|
[10] |
Huang Y, Prasad M, Lemon W J, Hampel H, Wright F A, Kornacker K, LiVolsi V, Frankel W, Kloos R T, Eng C, Pellegata N S, de la Chapelle A. Gene expression in papillary thyroid carcinoma reveals highly consistent profiles. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(26): 15044–15049
CrossRef
Google scholar
|
[11] |
Hastie T, Tibshirani R, Botstein D, Brown P. Supervised harvesting of expression trees. Genome Biology, 2001, 2(1): research0003.1–research0003.12
|
[12] |
Huang E, Cheng S H, Dressman H, Pittman J, Tsou M H, Horng C F, Bild A, Iversen E S, Liao M, Chen C M, West M, Nevins J R, Huang A T. Gene expression predictors of breast cancer outcomes. Lancet, 2003, 361(9369): 1590–1596
CrossRef
Google scholar
|
[13] |
Nilsson J, Fioretos T, Höglund M, Fontes M. Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics, 2004, 20(6): 874–880
CrossRef
Google scholar
|
[14] |
Boratyn G M, Datta S, Datta S. Incorporation of biological knowledge into distance for clustering genes. Bioinformation, 2007, 1(10): 396–405
|
[15] |
Bagirov A M, Ferguson B, Ivkovic S, Saunders G, Yearwood J. New algorithms for multi-class cancer diagnosis using tumor gene expression signatures. Bioinformatics, 2003, 19(14): 1800–1807
CrossRef
Google scholar
|
[16] |
Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics, 2005, 21(21): 3970–3975
CrossRef
Google scholar
|
[17] |
Sese J, Kurokawa Y, Monden M, Kato K, Morishita S. Constrained clusters of gene expression profiles with pathological features. Bioinformatics, 2004, 20(17): 3137–3145
CrossRef
Google scholar
|
[18] |
Dotan-Cohen D, Melkman A A, Kasif S. Hierarchical tree snipping: clustering guided by prior knowledge. Bioinformatics, 2007, 23(24): 3335–3342
CrossRef
Google scholar
|
[19] |
Belacel N, Cuperlović-Culf M, Laflamme M, Ouellette R. Fuzzy J-means and VNS methods for clustering genes from microarray data. Bioinformatics, 2004, 20(11): 1690–1701
CrossRef
Google scholar
|
[20] |
Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America, 2000, 97(22): 12079–12084
CrossRef
Google scholar
|
[21] |
Getz G, Gal H, Kela I, Notterman D A, Domany E. Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data. Bioinformatics, 2003, 19(9): 1079–1089
CrossRef
Google scholar
|
[22] |
Kluger Y, Basri R, Chang J T, Gerstein M. Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research, 2003, 13(4): 703–716
CrossRef
Google scholar
|
[23] |
Wang J, Delabie J, Aasheim H, Smeland E, Myklebost O. Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinformatics, 2002, 3: 36
CrossRef
Google scholar
|
[24] |
Hanczar B, Courtine M, Benis A, Hennegar C, Clément K, Zucker J D. Improving classification of microarray data using prototype-based feature selection. ACM SIGKDD Explorations Newsletter, 2003, 5(2): 23–30
CrossRef
Google scholar
|
[25] |
Crescenzi M, Giuliani A. The main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data. FEBS Letters, 2001, 507(1): 114–118
CrossRef
Google scholar
|
[26] |
Hsu A L, Tang S L, Halgamuge S K. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics, 2003, 19(16): 2131–2140
CrossRef
Google scholar
|
[27] |
Li W, Fan M, Xiong M. SamCluster: an integrated scheme for automatic discovery of sample classes using gene expression profile. Bioinformatics, 2003, 19(7): 811–817
CrossRef
Google scholar
|
[28] |
Dudoit S, Fridlyand J, Speed T. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Technical Report 576. Berkeley, CA: Department of Statistics, University of California, 2000
|
[29] |
Smolkin M, Ghosh D. Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics, 2003, 4: 36
CrossRef
Google scholar
|
[30] |
Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 2003, 52(1–2): 91–118
CrossRef
Google scholar
|
[31] |
Bhattacharjee A, Richards W G, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark E J, Lander E S, Wong W, Johnson B E, Golub T R, Sugarbaker D J, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(24): 13790–13795
CrossRef
Google scholar
|
[32] |
Dudoit S, Fridlyand J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 2003, 19(9): 1090–1099
CrossRef
Google scholar
|
[33] |
Swift S, Tucker A, Vinciotti V, Martin N, Orengo C, Liu X, Kellam P. Consensus clustering and functional interpretation of gene-expression data. Genome Biology, 2004, 5(11): R94
CrossRef
Google scholar
|
[34] |
Martoglio A M, Miskin J W, Smith S K, MacKay D J. A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer. Bioinformatics, 2002, 18(12): 1617–1624
CrossRef
Google scholar
|
[35] |
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson J A Jr, Marks J R, Nevins J R. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(20): 11462–11467
CrossRef
Google scholar
|
[36] |
Liebermeister W. Linear modes of gene expression determined by independent component analysis. Bioinformatics, 2002, 18(1): 51–60
CrossRef
Google scholar
|
[37] |
Pomeroy S L, Tamayo P, Gaasenbeek M, Sturla L M, Angelo M, McLaughlin M E, Kim J Y, Goumnerova L C, Black P M, Lau C, Allen J C, Zagzag D, Olson J M, Curran T, Wetmore C, Biegel J A, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D N, Mesirov J P, Lander E S, Golub T R. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 2002, 415(6870): 436–442
CrossRef
Google scholar
|
[38] |
Gordon G J, Richards W G, Sugarbaker D J, Jaklitsch M T, Bueno R. A prognostic test for adenocarcinoma of the lung from gene expression profiling data. Cancer Epidemiology, Biomarkers & Prevention, 2003, 12(9): 905–910
|
[39] |
Gordon G J, Jensen R V, Hsiao L L, Gullans S R, Blumenstock J E, Ramaswamy S, Richards W G, Sugarbaker D J, Bueno R. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 2002, 62(17): 4963–4967
|
[40] |
Dabney A R. Classification of microarrays to nearest centroids. Bioinformatics, 2005, 21(22): 4148–4154
CrossRef
Google scholar
|
[41] |
Thomas J G, Olson J M, Tapscott S J, Zhao L P. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, 2001, 11(7): 1227–1236
CrossRef
Google scholar
|
[42] |
Troyanskaya O G, Garber M E, Brown P O, Botstein D, Altman R B. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics, 2002, 18(11): 1454–1461
CrossRef
Google scholar
|
[43] |
Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics, 2003, 19(9): 1061–1069
CrossRef
Google scholar
|
[44] |
Broët P, Lewin A, Richardson S, Dalmasso C, Magdelenat H. A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics, 2004, 20(16): 2562–2571
CrossRef
Google scholar
|
[45] |
Nutt C L, Mani D R, Betensky R A, Tamayo P, Cairncross J G, Ladd C, Pohl U, Hartmann C, McLaughlin M E, Batchelor T T, Black P M, von Deimling A, Pomeroy S L, Golub T R, Louis D N. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 2003, 63(7): 1602–1607
|
[46] |
Gormley M, Dampier W, Ertel A, Karacali B, Tozeren A. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets. BMC Bioinformatics, 2007, 8: 415
CrossRef
Google scholar
|
[47] |
Furey T S, Cristianini N, Duffy N, Bednarski D W, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10): 906–914
CrossRef
Google scholar
|
[48] |
Li J, Wong L. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 2002, 18(5): 725–734
CrossRef
Google scholar
|
[49] |
Bø T H, Jonassen I. New feature subset selection procedures for classification of expression profiles. Genome Biology, 2002, 3(4): research0017.1–research0017.11
|
[50] |
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1–3): 389–422
CrossRef
Google scholar
|
[51] |
Zhang X, Lu X, Shi Q, Xu X Q, Leung H C, Harris L N, Iglehart J D, Miron A, Liu J S, Wong W H. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 2006, 7: 197
CrossRef
Google scholar
|
[52] |
Furlanello C, Serafini M, Merler S, Jurman G. Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics, 2003, 4: 54
CrossRef
Google scholar
|
[53] |
Li W, Xiong M. Tclass: tumor classification system based on gene expression profile. Bioinformatics, 2002, 18(2): 325–326
CrossRef
Google scholar
|
[54] |
Inza I, Sierra B, Blanco R, Larrañaga P. Gene selection by sequential search wrapper approaches in microarray cancer class prediction. Journal of Intelligent and Fuzzy Systems, 2002, 12(1): 25–33
|
[55] |
Xiong M, Fang X, Zhao J. Biomarker identification by feature wrappers. Genome Research, 2001, 11(11): 1878–1887
|
[56] |
Liu J J, Cutler G, Li W, Pan Z, Peng S, Hoey T, Chen L, Ling X B. Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics, 2005, 21(11): 2691–2697
CrossRef
Google scholar
|
[57] |
Peng S, Xu Q, Ling X B, Peng X, Du W, Chen L. Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Letters, 2003, 555(2): 358–362
CrossRef
Google scholar
|
[58] |
Li L, Weinberg C R, Darden T A, Pedersen L G. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 2001, 17(12): 1131–1142
CrossRef
Google scholar
|
[59] |
Ooi C H, Tan P. Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics, 2003, 19(1): 37–44
CrossRef
Google scholar
|
[60] |
Deutsch J M. Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics, 2003, 19(1): 45–52
CrossRef
Google scholar
|
[61] |
Jirapech-Umpai T, Aitken S. Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics, 2005. 6: 148
CrossRef
Google scholar
|
[62] |
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, 23(19): 2507–2517
CrossRef
Google scholar
|
[63] |
Krishnapuram B, Carin L, Hartemink A J. Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data. Journal of Computational Biology, 2004, 11(2-3): 227–242
CrossRef
Google scholar
|
[64] |
Cawley G C, Talbot N L C. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics, 2006, 22(19): 2348–2355
CrossRef
Google scholar
|
[65] |
Nguyen D V, Rocke D M. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 2002, 18(1): 39–50
CrossRef
Google scholar
|
[66] |
Nguyen D V, Rocke D M. Partial least squares proportional hazard regression for application to DNA microarray survival data. Bioinformatics, 2002, 18(12): 1625–1632
CrossRef
Google scholar
|
[67] |
Chang H Y, Nuyten D S, Sneddon J B, Hastie T, Tibshirani R, Sørlie T, Dai H, He Y D, van’t Veer L J, Bartelink H, van de Rijn M, Brown P O, van de Vijver M J. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(10): 3738–3743
CrossRef
Google scholar
|
[68] |
Khan J, Wei J S, Ringnér M, Saal L H, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C R, Peterson C, Meltzer P S. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 2001, 7(6): 673–679
CrossRef
Google scholar
|
[69] |
O’Neill M, Song L. Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics, 2003, 4: 13
CrossRef
Google scholar
|
[70] |
Liu B, Cui Q, Jiang T, Ma S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics, 2004, 5: 136
CrossRef
Google scholar
|
[71] |
Linder R, Dew D, Sudhoff H, Theegarten D, Remberger K, Pöppl S J, Wagner M. The ‘subsequent artificial neural network’ (SANN) approach might bring more classificatory power to ANN-based DNA microarray analyses. Bioinformatics, 2004, 20(18): 3544–3552
CrossRef
Google scholar
|
[72] |
Zhang W, Rekaya R, Bertrand K. A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer. Bioinformatics, 2006, 22(3): 317–325
CrossRef
Google scholar
|
[73] |
Gevaert O, De Smet F, Timmerman D, Moreau Y, De Moor B. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics, 2006, 22(14): e184–e190
CrossRef
Google scholar
|
[74] |
Goeman J J, Oosting J, Cleton-Jansen A M, Anninga J K, van Houwelingen H C. Testing association of a pathway with survival using gene expression data. Bioinformatics, 2005, 21(9): 1950–1957
CrossRef
Google scholar
|
[75] |
Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 2005, 21(13): 3001–3008
CrossRef
Google scholar
|
[76] |
Schumacher M, Binder H, Gerds T. Assessment of survival prediction models based on microarray data. Bioinformatics, 2007, 23(14): 1768–1774
CrossRef
Google scholar
|
[77] |
Kaderali L, Zander T, Faigle U, Wolf J, Schultze J L, Schrader R. CASPAR: a hierarchical bayesian approach to predict survival times in cancer from gene expression data. Bioinformatics, 2006, 22(12): 1495–1502
CrossRef
Google scholar
|
[78] |
Parmigiani G, Garrett-Mayer E S, Anbazhagan R, Gabrielson E. A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clinical Cancer Research, 2004, 10(9): 2922–2927
CrossRef
Google scholar
|
[79] |
Fernandez-Teijeiro A, Betensky R A, Sturla L M, Kim J Y, Tamayo P, Pomeroy S L. Combining gene expression profiles and clinical parameters for risk stratification in medulloblastomas. Journal of Clinical Oncology, 2004, 22(6): 994–998
CrossRef
Google scholar
|
[80] |
Barry W T, Nobel A B, Wright F A. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics, 2005, 21(9): 1943–1949
CrossRef
Google scholar
|
[81] |
Zhang C, Lu X, Zhang X. Significance of gene ranking for classification of microarray samples. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2006, 3(3): 312–320
CrossRef
Google scholar
|
[82] |
Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics, 2004, 20(18): 3583–3593
CrossRef
Google scholar
|
[83] |
Lu X, Li Y, Zhang X. A simple strategy for detecting outlier samples in microarray data. In: Proceedings of the Eighth International Conference on Control, Automation, Robotics and Vision. Kunming: IEEE, 2004, 2: 1331–1335
|
[84] |
Gamberoni G, Storari S, Volinia S. Finding biological process modifications in cancer tissues by mining gene expression correlations. BMC Bioinformatics, 2006, 7: 6
CrossRef
Google scholar
|
[85] |
Subramanian A, Tamayo P, Mootha V K, Mukherjee S, Ebert B L, Gillette M A, Paulovich A, Pomeroy S L, Golub T R, Lander E S, Mesirov J P. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545–15550
CrossRef
Google scholar
|
[86] |
Al-Shahrour F, Diaz-Uriarte R, Dopazo J. Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information. Bioinformatics, 2005, 21(13): 2988–2993
CrossRef
Google scholar
|
/
〈 | 〉 |