Pattern recognition methods in microarray based oncology study

Xuesong LU, Xuegong ZHANG

PDF(153 KB)
PDF(153 KB)
Front. Electr. Electron. Eng. ›› 2009, Vol. 4 ›› Issue (3) : 243-250. DOI: 10.1007/s11460-009-0041-y
REVIEW ARTICLE
REVIEW ARTICLE

Pattern recognition methods in microarray based oncology study

Author information +
History +

Abstract

With the development of microarray technology, more and more microarray-based oncology studies have been carried out. Huge amounts of data and the complexity of cancer mechanisms make data analysis methods a much more important part of these studies. In this article, we will mainly focus on the pattern recognition methods used in oncology studies. According to the availability of sample information, the unsupervised methods and supervised methods are reviewed separately. Finally, some possible future directions are proposed.

Keywords

pattern recognition methods / microarray / oncology

Cite this article

Download citation ▾
Xuesong LU, Xuegong ZHANG. Pattern recognition methods in microarray based oncology study. Front Elect Electr Eng Chin, 2009, 4(3): 243‒250 https://doi.org/10.1007/s11460-009-0041-y

References

[1]
Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P, Coller H, Loh M L, Downing J R, Caligiuri M A, Bloomfield C D, Lander E S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439): 531–537
CrossRef Google scholar
[2]
Alon U, Barkai N, Notterman D A, Gish K, Ybarra S, Mack D, Levine A J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences of the United States of America, 1999, 96(12): 6745–6750
CrossRef Google scholar
[3]
van’t Veer L J, Dai H, van de Vijver M J, He Y D, Hart A A, Mao M, Peterse H L, van der Kooy K, Marton M J, Witteveen A T, Schreiber G J, Kerkhoven R M, Roberts C, Linsley P S, Bernards R, Friend S H. Gene expression profiling predicts clinical outcome of breast cancer. Nature, 2002, 415(6871): 530–536
CrossRef Google scholar
[4]
Alizadeh A A, Eisen M B, Davis R E, Ma C, Lossos I S, Rosenwald A, Boldrick J C, Sabet H, Tran T, Yu X, Powell J I, Yang L, Marti G E, Moore T, Hudson J Jr, Lu L, Lewis D B, Tibshirani R, Sherlock G, Chan W C, Greiner T C, Weisenburger D D, Armitage J O, Warnke R, Levy R, Wilson W, Grever M R, Byrd J C, Botstein D, Brown P O, Staudt L M. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 2000, 403(6769): 503–511
CrossRef Google scholar
[5]
Beer D G, Kardia S L, Huang C C, Giordano T J, Levin A M, Misek D E, Lin L, Chen G, Gharib T G, Thomas D G, Lizyness M L, Kuick R, Hayasaka S, Taylor J M, Iannettoni M D, Orringer M B, Hanash S. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine, 2002, 8(8): 816–824
[6]
Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, Sampas N, Dougherty E, Wang E, Marincola F, Gooden C, Lueders J, Glatfelter A, Pollock P, Carpten J, Gillanders E, Leja D, Dietrich K, Beaudry C, Berens M, Alberts D, Sondak V. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 2000, 406(6795): 536–540
CrossRef Google scholar
[7]
Dyrskjøt L, Thykjaer T, Kruhøffer M, Jensen J L, Marcussen N, Hamilton-Dutoit S, Wolf H, Orntoft T F. Identifying distinct classes of bladder carcinoma using microarrays. Nature Genetics, 2003, 33(1): 90–96
CrossRef Google scholar
[8]
Kapp A V, Jeffrey S S, Langerød A, B-rresen-Dale A L, Han W, Noh D Y, Bukholm I R, Nicolau M, Brown P O, Tibshirani R. Discovery and validation of breast cancer subtypes. BMC Genomics, 2006, 7: 231
CrossRef Google scholar
[9]
Ross D T, Scherf U, Eisen M B, Perou C M, Rees C, Spellman P, Iyer V, Jeffrey S S, van de Rijn M, Waltham M, Pergamenschikov A, Lee J C, Lashkari D, Shalon D, Myers T G, Weinstein J N, Botstein D, Brown P O. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 2000, 24(3): 227–235
CrossRef Google scholar
[10]
Huang Y, Prasad M, Lemon W J, Hampel H, Wright F A, Kornacker K, LiVolsi V, Frankel W, Kloos R T, Eng C, Pellegata N S, de la Chapelle A. Gene expression in papillary thyroid carcinoma reveals highly consistent profiles. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(26): 15044–15049
CrossRef Google scholar
[11]
Hastie T, Tibshirani R, Botstein D, Brown P. Supervised harvesting of expression trees. Genome Biology, 2001, 2(1): research0003.1–research0003.12
[12]
Huang E, Cheng S H, Dressman H, Pittman J, Tsou M H, Horng C F, Bild A, Iversen E S, Liao M, Chen C M, West M, Nevins J R, Huang A T. Gene expression predictors of breast cancer outcomes. Lancet, 2003, 361(9369): 1590–1596
CrossRef Google scholar
[13]
Nilsson J, Fioretos T, Höglund M, Fontes M. Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics, 2004, 20(6): 874–880
CrossRef Google scholar
[14]
Boratyn G M, Datta S, Datta S. Incorporation of biological knowledge into distance for clustering genes. Bioinformation, 2007, 1(10): 396–405
[15]
Bagirov A M, Ferguson B, Ivkovic S, Saunders G, Yearwood J. New algorithms for multi-class cancer diagnosis using tumor gene expression signatures. Bioinformatics, 2003, 19(14): 1800–1807
CrossRef Google scholar
[16]
Gao Y, Church G. Improving molecular cancer class discovery through sparse non-negative matrix factorization. Bioinformatics, 2005, 21(21): 3970–3975
CrossRef Google scholar
[17]
Sese J, Kurokawa Y, Monden M, Kato K, Morishita S. Constrained clusters of gene expression profiles with pathological features. Bioinformatics, 2004, 20(17): 3137–3145
CrossRef Google scholar
[18]
Dotan-Cohen D, Melkman A A, Kasif S. Hierarchical tree snipping: clustering guided by prior knowledge. Bioinformatics, 2007, 23(24): 3335–3342
CrossRef Google scholar
[19]
Belacel N, Cuperlović-Culf M, Laflamme M, Ouellette R. Fuzzy J-means and VNS methods for clustering genes from microarray data. Bioinformatics, 2004, 20(11): 1690–1701
CrossRef Google scholar
[20]
Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. Proceedings of the National Academy of Sciences of the United States of America, 2000, 97(22): 12079–12084
CrossRef Google scholar
[21]
Getz G, Gal H, Kela I, Notterman D A, Domany E. Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data. Bioinformatics, 2003, 19(9): 1079–1089
CrossRef Google scholar
[22]
Kluger Y, Basri R, Chang J T, Gerstein M. Spectral biclustering of microarray data: coclustering genes and conditions. Genome Research, 2003, 13(4): 703–716
CrossRef Google scholar
[23]
Wang J, Delabie J, Aasheim H, Smeland E, Myklebost O. Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinformatics, 2002, 3: 36
CrossRef Google scholar
[24]
Hanczar B, Courtine M, Benis A, Hennegar C, Clément K, Zucker J D. Improving classification of microarray data using prototype-based feature selection. ACM SIGKDD Explorations Newsletter, 2003, 5(2): 23–30
CrossRef Google scholar
[25]
Crescenzi M, Giuliani A. The main biological determinants of tumor line taxonomy elucidated by a principal component analysis of microarray data. FEBS Letters, 2001, 507(1): 114–118
CrossRef Google scholar
[26]
Hsu A L, Tang S L, Halgamuge S K. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics, 2003, 19(16): 2131–2140
CrossRef Google scholar
[27]
Li W, Fan M, Xiong M. SamCluster: an integrated scheme for automatic discovery of sample classes using gene expression profile. Bioinformatics, 2003, 19(7): 811–817
CrossRef Google scholar
[28]
Dudoit S, Fridlyand J, Speed T. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Technical Report 576. Berkeley, CA: Department of Statistics, University of California, 2000
[29]
Smolkin M, Ghosh D. Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics, 2003, 4: 36
CrossRef Google scholar
[30]
Monti S, Tamayo P, Mesirov J, Golub T. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 2003, 52(1–2): 91–118
CrossRef Google scholar
[31]
Bhattacharjee A, Richards W G, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark E J, Lander E S, Wong W, Johnson B E, Golub T R, Sugarbaker D J, Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(24): 13790–13795
CrossRef Google scholar
[32]
Dudoit S, Fridlyand J. Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 2003, 19(9): 1090–1099
CrossRef Google scholar
[33]
Swift S, Tucker A, Vinciotti V, Martin N, Orengo C, Liu X, Kellam P. Consensus clustering and functional interpretation of gene-expression data. Genome Biology, 2004, 5(11): R94
CrossRef Google scholar
[34]
Martoglio A M, Miskin J W, Smith S K, MacKay D J. A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer. Bioinformatics, 2002, 18(12): 1617–1624
CrossRef Google scholar
[35]
West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson J A Jr, Marks J R, Nevins J R. Predicting the clinical status of human breast cancer by using gene expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(20): 11462–11467
CrossRef Google scholar
[36]
Liebermeister W. Linear modes of gene expression determined by independent component analysis. Bioinformatics, 2002, 18(1): 51–60
CrossRef Google scholar
[37]
Pomeroy S L, Tamayo P, Gaasenbeek M, Sturla L M, Angelo M, McLaughlin M E, Kim J Y, Goumnerova L C, Black P M, Lau C, Allen J C, Zagzag D, Olson J M, Curran T, Wetmore C, Biegel J A, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D N, Mesirov J P, Lander E S, Golub T R. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 2002, 415(6870): 436–442
CrossRef Google scholar
[38]
Gordon G J, Richards W G, Sugarbaker D J, Jaklitsch M T, Bueno R. A prognostic test for adenocarcinoma of the lung from gene expression profiling data. Cancer Epidemiology, Biomarkers & Prevention, 2003, 12(9): 905–910
[39]
Gordon G J, Jensen R V, Hsiao L L, Gullans S R, Blumenstock J E, Ramaswamy S, Richards W G, Sugarbaker D J, Bueno R. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research, 2002, 62(17): 4963–4967
[40]
Dabney A R. Classification of microarrays to nearest centroids. Bioinformatics, 2005, 21(22): 4148–4154
CrossRef Google scholar
[41]
Thomas J G, Olson J M, Tapscott S J, Zhao L P. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Research, 2001, 11(7): 1227–1236
CrossRef Google scholar
[42]
Troyanskaya O G, Garber M E, Brown P O, Botstein D, Altman R B. Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics, 2002, 18(11): 1454–1461
CrossRef Google scholar
[43]
Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics, 2003, 19(9): 1061–1069
CrossRef Google scholar
[44]
Broët P, Lewin A, Richardson S, Dalmasso C, Magdelenat H. A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics, 2004, 20(16): 2562–2571
CrossRef Google scholar
[45]
Nutt C L, Mani D R, Betensky R A, Tamayo P, Cairncross J G, Ladd C, Pohl U, Hartmann C, McLaughlin M E, Batchelor T T, Black P M, von Deimling A, Pomeroy S L, Golub T R, Louis D N. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 2003, 63(7): 1602–1607
[46]
Gormley M, Dampier W, Ertel A, Karacali B, Tozeren A. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets. BMC Bioinformatics, 2007, 8: 415
CrossRef Google scholar
[47]
Furey T S, Cristianini N, Duffy N, Bednarski D W, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10): 906–914
CrossRef Google scholar
[48]
Li J, Wong L. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 2002, 18(5): 725–734
CrossRef Google scholar
[49]
Bø T H, Jonassen I. New feature subset selection procedures for classification of expression profiles. Genome Biology, 2002, 3(4): research0017.1–research0017.11
[50]
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1–3): 389–422
CrossRef Google scholar
[51]
Zhang X, Lu X, Shi Q, Xu X Q, Leung H C, Harris L N, Iglehart J D, Miron A, Liu J S, Wong W H. Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics, 2006, 7: 197
CrossRef Google scholar
[52]
Furlanello C, Serafini M, Merler S, Jurman G. Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics, 2003, 4: 54
CrossRef Google scholar
[53]
Li W, Xiong M. Tclass: tumor classification system based on gene expression profile. Bioinformatics, 2002, 18(2): 325–326
CrossRef Google scholar
[54]
Inza I, Sierra B, Blanco R, Larrañaga P. Gene selection by sequential search wrapper approaches in microarray cancer class prediction. Journal of Intelligent and Fuzzy Systems, 2002, 12(1): 25–33
[55]
Xiong M, Fang X, Zhao J. Biomarker identification by feature wrappers. Genome Research, 2001, 11(11): 1878–1887
[56]
Liu J J, Cutler G, Li W, Pan Z, Peng S, Hoey T, Chen L, Ling X B. Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics, 2005, 21(11): 2691–2697
CrossRef Google scholar
[57]
Peng S, Xu Q, Ling X B, Peng X, Du W, Chen L. Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Letters, 2003, 555(2): 358–362
CrossRef Google scholar
[58]
Li L, Weinberg C R, Darden T A, Pedersen L G. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 2001, 17(12): 1131–1142
CrossRef Google scholar
[59]
Ooi C H, Tan P. Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics, 2003, 19(1): 37–44
CrossRef Google scholar
[60]
Deutsch J M. Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics, 2003, 19(1): 45–52
CrossRef Google scholar
[61]
Jirapech-Umpai T, Aitken S. Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics, 2005. 6: 148
CrossRef Google scholar
[62]
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics, 2007, 23(19): 2507–2517
CrossRef Google scholar
[63]
Krishnapuram B, Carin L, Hartemink A J. Joint classifier and feature optimization for comprehensive cancer diagnosis using gene expression data. Journal of Computational Biology, 2004, 11(2-3): 227–242
CrossRef Google scholar
[64]
Cawley G C, Talbot N L C. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics, 2006, 22(19): 2348–2355
CrossRef Google scholar
[65]
Nguyen D V, Rocke D M. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 2002, 18(1): 39–50
CrossRef Google scholar
[66]
Nguyen D V, Rocke D M. Partial least squares proportional hazard regression for application to DNA microarray survival data. Bioinformatics, 2002, 18(12): 1625–1632
CrossRef Google scholar
[67]
Chang H Y, Nuyten D S, Sneddon J B, Hastie T, Tibshirani R, Sørlie T, Dai H, He Y D, van’t Veer L J, Bartelink H, van de Rijn M, Brown P O, van de Vijver M J. Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(10): 3738–3743
CrossRef Google scholar
[68]
Khan J, Wei J S, Ringnér M, Saal L H, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C R, Peterson C, Meltzer P S. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine, 2001, 7(6): 673–679
CrossRef Google scholar
[69]
O’Neill M, Song L. Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinformatics, 2003, 4: 13
CrossRef Google scholar
[70]
Liu B, Cui Q, Jiang T, Ma S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics, 2004, 5: 136
CrossRef Google scholar
[71]
Linder R, Dew D, Sudhoff H, Theegarten D, Remberger K, Pöppl S J, Wagner M. The ‘subsequent artificial neural network’ (SANN) approach might bring more classificatory power to ANN-based DNA microarray analyses. Bioinformatics, 2004, 20(18): 3544–3552
CrossRef Google scholar
[72]
Zhang W, Rekaya R, Bertrand K. A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer. Bioinformatics, 2006, 22(3): 317–325
CrossRef Google scholar
[73]
Gevaert O, De Smet F, Timmerman D, Moreau Y, De Moor B. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics, 2006, 22(14): e184–e190
CrossRef Google scholar
[74]
Goeman J J, Oosting J, Cleton-Jansen A M, Anninga J K, van Houwelingen H C. Testing association of a pathway with survival using gene expression data. Bioinformatics, 2005, 21(9): 1950–1957
CrossRef Google scholar
[75]
Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 2005, 21(13): 3001–3008
CrossRef Google scholar
[76]
Schumacher M, Binder H, Gerds T. Assessment of survival prediction models based on microarray data. Bioinformatics, 2007, 23(14): 1768–1774
CrossRef Google scholar
[77]
Kaderali L, Zander T, Faigle U, Wolf J, Schultze J L, Schrader R. CASPAR: a hierarchical bayesian approach to predict survival times in cancer from gene expression data. Bioinformatics, 2006, 22(12): 1495–1502
CrossRef Google scholar
[78]
Parmigiani G, Garrett-Mayer E S, Anbazhagan R, Gabrielson E. A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clinical Cancer Research, 2004, 10(9): 2922–2927
CrossRef Google scholar
[79]
Fernandez-Teijeiro A, Betensky R A, Sturla L M, Kim J Y, Tamayo P, Pomeroy S L. Combining gene expression profiles and clinical parameters for risk stratification in medulloblastomas. Journal of Clinical Oncology, 2004, 22(6): 994–998
CrossRef Google scholar
[80]
Barry W T, Nobel A B, Wright F A. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics, 2005, 21(9): 1943–1949
CrossRef Google scholar
[81]
Zhang C, Lu X, Zhang X. Significance of gene ranking for classification of microarray samples. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2006, 3(3): 312–320
CrossRef Google scholar
[82]
Dettling M. BagBoosting for tumor classification with gene expression data. Bioinformatics, 2004, 20(18): 3583–3593
CrossRef Google scholar
[83]
Lu X, Li Y, Zhang X. A simple strategy for detecting outlier samples in microarray data. In: Proceedings of the Eighth International Conference on Control, Automation, Robotics and Vision. Kunming: IEEE, 2004, 2: 1331–1335
[84]
Gamberoni G, Storari S, Volinia S. Finding biological process modifications in cancer tissues by mining gene expression correlations. BMC Bioinformatics, 2006, 7: 6
CrossRef Google scholar
[85]
Subramanian A, Tamayo P, Mootha V K, Mukherjee S, Ebert B L, Gillette M A, Paulovich A, Pomeroy S L, Golub T R, Lander E S, Mesirov J P. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545–15550
CrossRef Google scholar
[86]
Al-Shahrour F, Diaz-Uriarte R, Dopazo J. Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information. Bioinformatics, 2005, 21(13): 2988–2993
CrossRef Google scholar

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 60575014, 30625012 and 60721003), and National High-tech R&D Program (No. 2006AA02Z325).

RIGHTS & PERMISSIONS

2014 Higher Education Press and Springer-Verlag Berlin Heidelberg
PDF(153 KB)

Accesses

Citations

Detail

Sections
Recommended

/