Pattern recognition methods in microarray based oncology study

Xuesong LU; Xuegong ZHANG

doi:10.1007/s11460-009-0041-y

PDF(153 KB)

Front. Electr. Electron. Eng. ›› 2009, Vol. 4 ›› Issue (3) : 243-250. DOI: 10.1007/s11460-009-0041-y

REVIEW ARTICLE

Pattern recognition methods in microarray based oncology study

Xuesong LU¹ ,
Xuegong ZHANG²

Author information +

History +

Abstract

With the development of microarray technology, more and more microarray-based oncology studies have been carried out. Huge amounts of data and the complexity of cancer mechanisms make data analysis methods a much more important part of these studies. In this article, we will mainly focus on the pattern recognition methods used in oncology studies. According to the availability of sample information, the unsupervised methods and supervised methods are reviewed separately. Finally, some possible future directions are proposed.

Keywords

pattern recognition methods / microarray / oncology

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xuesong LU, Xuegong ZHANG. Pattern recognition methods in microarray based oncology study. Front Elect Electr Eng Chin, 2009, 4(3): 243‒250 https://doi.org/10.1007/s11460-009-0041-y

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]

Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P, Coller H, Loh M L, Downing J R, Caligiuri M A, Bloomfield C D, Lander E S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439): 531–537

CrossRef Google scholar

[2]

Alon U, Barkai N, Notterman D A, Gish K, Ybarra S, Mack D, Levine A J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 1999, 96(12): 6745–6750

CrossRef Google scholar

[3]

van’t Veer L J, Dai H, van de Vijver M J, He Y D, Hart A A, Mao M, Peterse H L, van der Kooy K, Marton M J, Witteveen A T, Schreiber G J, Kerkhoven R M, Roberts C, Linsley P S, Bernards R, Friend S H. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 2002, 415(6871): 530–536

CrossRef Google scholar

[4]

Alizadeh A A, Eisen M B, Davis R E, Ma C, Lossos I S, Rosenwald A, Boldrick J C, Sabet H, Tran T, Yu X, Powell J I, Yang L, Marti G E, Moore T, Hudson J Jr, Lu L, Lewis D B, Tibshirani R, Sherlock G, Chan W C, Greiner T C, Weisenburger D D, Armitage J O, Warnke R, Levy R, Wilson W, Grever M R, Byrd J C, Botstein D, Brown P O, Staudt L M. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 2000, 403(6769): 503–511

CrossRef Google scholar

[5]

Beer D G, Kardia S L, Huang C C, Giordano T J, Levin A M, Misek D E, Lin L, Chen G, Gharib T G, Thomas D G, Lizyness M L, Kuick R, Hayasaka S, Taylor J M, Iannettoni M D, Orringer M B, Hanash S. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine, 2002, 8(8): 816–824

[6]

Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 2000, 406(6795): 536–540

CrossRef Google scholar

[7]	Dyrskjøt L, Thykjaer T, Kruhøffer M, Jensen J L, Marcussen N, Hamilton-Dutoit S, Wolf H, Orntoft T F. Identifying distinct classes of bladder carcinoma using microarrays. Nature Genetics, 2003, 33(1): 90–96 CrossRef Google scholar

[8]	Kapp A V, Jeffrey S S, Langerød A, B-rresen-Dale A L, Han W, Noh D Y, Bukholm I R, Nicolau M, Brown P O, Tibshirani R. Discovery and validation of breast cancer subtypes. BMC Genomics, 2006, 7: 231 CrossRef Google scholar

[9]

Ross D T, Scherf U, Eisen M B, Perou C M, Rees C, Spellman P, Iyer V, Jeffrey S S, van de Rijn M, Waltham M, Pergamenschikov A, Lee J C, Lashkari D, Shalon D, Myers T G, Weinstein J N, Botstein D, Brown P O. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 2000, 24(3): 227–235

CrossRef Google scholar

[10]

Huang Y, Prasad M, Lemon W J, Hampel H, Wright F A, Kornacker K, LiVolsi V, Frankel W, Kloos R T, Eng C, Pellegata N S, de la Chapelle A. Gene expression in papillary thyroid carcinoma reveals highly consistent profiles. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(26): 15044–15049

CrossRef Google scholar

[11]	Hastie T, Tibshirani R, Botstein D, Brown P. Supervised harvesting of expression trees. Genome Biology, 2001, 2(1): research0003.1–research0003.12

[12]	Huang E, Cheng S H, Dressman H, Pittman J, Tsou M H, Horng C F, Bild A, Iversen E S, Liao M, Chen C M, West M, Nevins J R, Huang A T. Gene expression predictors of breast cancer outcomes. Lancet, 2003, 361(9369): 1590–1596 CrossRef Google scholar

[13]	Nilsson J, Fioretos T, Höglund M, Fontes M. Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics, 2004, 20(6): 874–880 CrossRef Google scholar

[14]	Boratyn G M, Datta S, Datta S. Incorporation of biological knowledge into distance for clustering genes. Bioinformation, 2007, 1(10): 396–405

[15]	Bagirov A M, Ferguson B, Ivkovic S, Saunders G, Yearwood J. New algorithms for multi-class cancer diagnosis using tumor gene expression signatures. Bioinformatics, 2003, 19(14): 1800–1807 CrossRef Google scholar

[16]	Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics, 2005, 21(21): 3970–3975 CrossRef Google scholar

[17]	Sese J, Kurokawa Y, Monden M, Kato K, Morishita S. Constrained clusters of gene expression profiles with pathological features. Bioinformatics, 2004, 20(17): 3137–3145 CrossRef Google scholar

[18]	Dotan-Cohen D, Melkman A A, Kasif S. Hierarchical tree snipping: clustering guided by prior knowledge. Bioinformatics, 2007, 23(24): 3335–3342 CrossRef Google scholar

[19]	Belacel N, Cuperlović-Culf M, Laflamme M, Ouellette R. Fuzzy J-means and VNS methods for clustering genes from microarray data. Bioinformatics, 2004, 20(11): 1690–1701 CrossRef Google scholar

[20]	Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America, 2000, 97(22): 12079–12084 CrossRef Google scholar

[21]	Getz G, Gal H, Kela I, Notterman D A, Domany E. Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data. Bioinformatics, 2003, 19(9): 1079–1089 CrossRef Google scholar

[22]	Kluger Y, Basri R, Chang J T, Gerstein M. Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research, 2003, 13(4): 703–716 CrossRef Google scholar

[23]	Wang J, Delabie J, Aasheim H, Smeland E, Myklebost O. Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinformatics, 2002, 3: 36 CrossRef Google scholar

[24]	Hanczar B, Courtine M, Benis A, Hennegar C, Clément K, Zucker J D. Improving classification of microarray data using prototype-based feature selection. ACM SIGKDD Explorations Newsletter, 2003, 5(2): 23–30 CrossRef Google scholar

[25]	Crescenzi M, Giuliani A. The main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data. FEBS Letters, 2001, 507(1): 114–118 CrossRef Google scholar

[26]	Hsu A L, Tang S L, Halgamuge S K. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics, 2003, 19(16): 2131–2140 CrossRef Google scholar

[27]	Li W, Fan M, Xiong M. SamCluster: an integrated scheme for automatic discovery of sample classes using gene expression profile. Bioinformatics, 2003, 19(7): 811–817 CrossRef Google scholar

[28]	Dudoit S, Fridlyand J, Speed T. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Technical Report 576. Berkeley, CA: Department of Statistics, University of California, 2000

[29]	Smolkin M, Ghosh D. Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics, 2003, 4: 36 CrossRef Google scholar

[30]	Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 2003, 52(1–2): 91–118 CrossRef Google scholar

[31]

Bhattacharjee A, Richards W G, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark E J, Lander E S, Wong W, Johnson B E, Golub T R, Sugarbaker D J, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(24): 13790–13795

CrossRef Google scholar

[32]	Dudoit S, Fridlyand J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 2003, 19(9): 1090–1099 CrossRef Google scholar

[33]	Swift S, Tucker A, Vinciotti V, Martin N, Orengo C, Liu X, Kellam P. Consensus clustering and functional interpretation of gene-expression data. Genome Biology, 2004, 5(11): R94 CrossRef Google scholar

[34]	Martoglio A M, Miskin J W, Smith S K, MacKay D J. A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer. Bioinformatics, 2002, 18(12): 1617–1624 CrossRef Google scholar

[35]

West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson J A Jr, Marks J R, Nevins J R. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(20): 11462–11467

CrossRef Google scholar

[36]	Liebermeister W. Linear modes of gene expression determined by independent component analysis. Bioinformatics, 2002, 18(1): 51–60 CrossRef Google scholar

[37]

Pomeroy S L, Tamayo P, Gaasenbeek M, Sturla L M, Angelo M, McLaughlin M E, Kim J Y, Goumnerova L C, Black P M, Lau C, Allen J C, Zagzag D, Olson J M, Curran T, Wetmore C, Biegel J A, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D N, Mesirov J P, Lander E S, Golub T R. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 2002, 415(6870): 436–442

CrossRef Google scholar

[38]	Gordon G J, Richards W G, Sugarbaker D J, Jaklitsch M T, Bueno R. A prognostic test for adenocarcinoma of the lung from gene expression profiling data. Cancer Epidemiology, Biomarkers & Prevention, 2003, 12(9): 905–910

[39]	Gordon G J, Jensen R V, Hsiao L L, Gullans S R, Blumenstock J E, Ramaswamy S, Richards W G, Sugarbaker D J, Bueno R. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 2002, 62(17): 4963–4967

[40]	Dabney A R. Classification of microarrays to nearest centroids. Bioinformatics, 2005, 21(22): 4148–4154 CrossRef Google scholar

[41]	Thomas J G, Olson J M, Tapscott S J, Zhao L P. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, 2001, 11(7): 1227–1236 CrossRef Google scholar

[42]	Troyanskaya O G, Garber M E, Brown P O, Botstein D, Altman R B. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics, 2002, 18(11): 1454–1461 CrossRef Google scholar

[43]	Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics, 2003, 19(9): 1061–1069 CrossRef Google scholar

[44]	Broët P, Lewin A, Richardson S, Dalmasso C, Magdelenat H. A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics, 2004, 20(16): 2562–2571 CrossRef Google scholar

[45]

Nutt C L, Mani D R, Betensky R A, Tamayo P, Cairncross J G, Ladd C, Pohl U, Hartmann C, McLaughlin M E, Batchelor T T, Black P M, von Deimling A, Pomeroy S L, Golub T R, Louis D N. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 2003, 63(7): 1602–1607

[46]	Gormley M, Dampier W, Ertel A, Karacali B, Tozeren A. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets. BMC Bioinformatics, 2007, 8: 415 CrossRef Google scholar

[47]	Furey T S, Cristianini N, Duffy N, Bednarski D W, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10): 906–914 CrossRef Google scholar

[48]	Li J, Wong L. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 2002, 18(5): 725–734 CrossRef Google scholar

[49]	Bø T H, Jonassen I. New feature subset selection procedures for classification of expression profiles. Genome Biology, 2002, 3(4): research0017.1–research0017.11

[50]	Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1–3): 389–422 CrossRef Google scholar

[51]	Zhang X, Lu X, Shi Q, Xu X Q, Leung H C, Harris L N, Iglehart J D, Miron A, Liu J S, Wong W H. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 2006, 7: 197 CrossRef Google scholar

[52]	Furlanello C, Serafini M, Merler S, Jurman G. Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics, 2003, 4: 54 CrossRef Google scholar

[53]	Li W, Xiong M. Tclass: tumor classification system based on gene expression profile. Bioinformatics, 2002, 18(2): 325–326 CrossRef Google scholar

[54]	Inza I, Sierra B, Blanco R, Larrañaga P. Gene selection by sequential search wrapper approaches in microarray cancer class prediction. Journal of Intelligent and Fuzzy Systems, 2002, 12(1): 25–33

[55]	Xiong M, Fang X, Zhao J. Biomarker identification by feature wrappers. Genome Research, 2001, 11(11): 1878–1887

[56]	Liu J J, Cutler G, Li W, Pan Z, Peng S, Hoey T, Chen L, Ling X B. Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics, 2005, 21(11): 2691–2697 CrossRef Google scholar

[57]	Peng S, Xu Q, Ling X B, Peng X, Du W, Chen L. Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Letters, 2003, 555(2): 358–362 CrossRef Google scholar

[58]	Li L, Weinberg C R, Darden T A, Pedersen L G. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 2001, 17(12): 1131–1142 CrossRef Google scholar

[59]	Ooi C H, Tan P. Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics, 2003, 19(1): 37–44 CrossRef Google scholar

[60]	Deutsch J M. Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics, 2003, 19(1): 45–52 CrossRef Google scholar

[61]	Jirapech-Umpai T, Aitken S. Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics, 2005. 6: 148 CrossRef Google scholar

[62]	Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, 23(19): 2507–2517 CrossRef Google scholar

[63]	Krishnapuram B, Carin L, Hartemink A J. Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data. Journal of Computational Biology, 2004, 11(2-3): 227–242 CrossRef Google scholar

[64]	Cawley G C, Talbot N L C. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics, 2006, 22(19): 2348–2355 CrossRef Google scholar

[65]	Nguyen D V, Rocke D M. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 2002, 18(1): 39–50 CrossRef Google scholar

[66]	Nguyen D V, Rocke D M. Partial least squares proportional hazard regression for application to DNA microarray survival data. Bioinformatics, 2002, 18(12): 1625–1632 CrossRef Google scholar

[67]

Chang H Y, Nuyten D S, Sneddon J B, Hastie T, Tibshirani R, Sørlie T, Dai H, He Y D, van’t Veer L J, Bartelink H, van de Rijn M, Brown P O, van de Vijver M J. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(10): 3738–3743

CrossRef Google scholar

[68]

Khan J, Wei J S, Ringnér M, Saal L H, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C R, Peterson C, Meltzer P S. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 2001, 7(6): 673–679

CrossRef Google scholar

[69]	O’Neill M, Song L. Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics, 2003, 4: 13 CrossRef Google scholar

[70]	Liu B, Cui Q, Jiang T, Ma S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics, 2004, 5: 136 CrossRef Google scholar

[71]	Linder R, Dew D, Sudhoff H, Theegarten D, Remberger K, Pöppl S J, Wagner M. The ‘subsequent artificial neural network’ (SANN) approach might bring more classificatory power to ANN-based DNA microarray analyses. Bioinformatics, 2004, 20(18): 3544–3552 CrossRef Google scholar

[72]	Zhang W, Rekaya R, Bertrand K. A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer. Bioinformatics, 2006, 22(3): 317–325 CrossRef Google scholar

[73]	Gevaert O, De Smet F, Timmerman D, Moreau Y, De Moor B. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics, 2006, 22(14): e184–e190 CrossRef Google scholar

[74]	Goeman J J, Oosting J, Cleton-Jansen A M, Anninga J K, van Houwelingen H C. Testing association of a pathway with survival using gene expression data. Bioinformatics, 2005, 21(9): 1950–1957 CrossRef Google scholar

[75]	Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 2005, 21(13): 3001–3008 CrossRef Google scholar

[76]	Schumacher M, Binder H, Gerds T. Assessment of survival prediction models based on microarray data. Bioinformatics, 2007, 23(14): 1768–1774 CrossRef Google scholar

[77]	Kaderali L, Zander T, Faigle U, Wolf J, Schultze J L, Schrader R. CASPAR: a hierarchical bayesian approach to predict survival times in cancer from gene expression data. Bioinformatics, 2006, 22(12): 1495–1502 CrossRef Google scholar

[78]	Parmigiani G, Garrett-Mayer E S, Anbazhagan R, Gabrielson E. A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clinical Cancer Research, 2004, 10(9): 2922–2927 CrossRef Google scholar

[79]	Fernandez-Teijeiro A, Betensky R A, Sturla L M, Kim J Y, Tamayo P, Pomeroy S L. Combining gene expression profiles and clinical parameters for risk stratification in medulloblastomas. Journal of Clinical Oncology, 2004, 22(6): 994–998 CrossRef Google scholar

[80]	Barry W T, Nobel A B, Wright F A. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics, 2005, 21(9): 1943–1949 CrossRef Google scholar

[81]	Zhang C, Lu X, Zhang X. Significance of gene ranking for classification of microarray samples. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2006, 3(3): 312–320 CrossRef Google scholar

[82]	Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics, 2004, 20(18): 3583–3593 CrossRef Google scholar

[83]	Lu X, Li Y, Zhang X. A simple strategy for detecting outlier samples in microarray data. In: Proceedings of the Eighth International Conference on Control, Automation, Robotics and Vision. Kunming: IEEE, 2004, 2: 1331–1335

[84]	Gamberoni G, Storari S, Volinia S. Finding biological process modifications in cancer tissues by mining gene expression correlations. BMC Bioinformatics, 2006, 7: 6 CrossRef Google scholar

[85]

Subramanian A, Tamayo P, Mootha V K, Mukherjee S, Ebert B L, Gillette M A, Paulovich A, Pomeroy S L, Golub T R, Lander E S, Mesirov J P. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545–15550

CrossRef Google scholar

[86]	Al-Shahrour F, Diaz-Uriarte R, Dopazo J. Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information. Bioinformatics, 2005, 21(13): 2988–2993 CrossRef Google scholar

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 60575014, 30625012 and 60721003), and National High-tech R&D Program (No. 2006AA02Z325).