Please wait a minute...

Frontiers of Optoelectronics

Front. Optoelectron.    2017, Vol. 10 Issue (3) : 273-279     DOI: 10.1007/s12200-017-0726-4
RESEARCH ARTICLE |
Recursive feature elimination in Raman spectra with support vector machines
Bernd KAMPE1, Sandra KLOß1,2, Thomas BOCKLITZ1,2, Petra RÖSCH1,2, Jürgen POPP1,2,3()
1. Institute of Physical Chemistry and Abbe Center of Photonics, University of Jena, Helmholtzweg 4, D-07743 Jena, Germany
2. InfectoGnostics Research Campus Jena, Center for Applied Research, Philosophenweg 7, 07743 Jena, Germany
3. Leibniz-Institute of Photonic Technology, Albert-Einstein-Straße 9, D-07745 Jena, Germany
Download: PDF(210 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

The presence of irrelevant and correlated data points in a Raman spectrum can lead to a decline in classifier performance. We introduce support vector machine (SVM)-based recursive feature elimination into the field of Raman spectroscopy and demonstrate its performance on a data set of spectra of clinically relevant microorganisms in urine samples, along with patient samples. As the original technique is only suitable for two-class problems, we adapt it to the multi-class setting. It is shown that a large amount of spectral points can be removed without degrading the prediction accuracy of the resulting model notably.

Keywords feature selection      Raman spectroscopy      pattern recognition      chemometrics     
Corresponding Authors: Jürgen POPP   
Just Accepted Date: 21 June 2017   Online First Date: 14 July 2017    Issue Date: 26 September 2017
 Cite this article:   
Bernd KAMPE,Sandra KLOß,Thomas BOCKLITZ, et al. Recursive feature elimination in Raman spectra with support vector machines[J]. Front. Optoelectron., 2017, 10(3): 273-279.
 URL:  
http://journal.hep.com.cn/foe/EN/10.1007/s12200-017-0726-4
http://journal.hep.com.cn/foe/EN/Y2017/V10/I3/273
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Bernd KAMPE
Sandra KLOß
Thomas BOCKLITZ
Petra RÖSCH
Jürgen POPP
bacterial speciesin training setin validation set
E. faecalis42953
E. faecium25650
S. epidermidis22747
S. haemolyticus22552
S. hominis20749
S. saprophyticus23730
S. aureus28550
E. coli36037
K. pneumoniae23340
P. aeruginosa24939
P. mirabilis24467
Tab.1  Numbers of spectra per data set
Fig.1  Sample preprocessed Raman spectrum of E. faecalis showing the induced ranking of the method
Fig.2  Development of the cross-validation error rate on the training data set over the course of feature removal
Fig.3  Development of the classification error rate on the independent validation data set over the course of feature removal
Fig.4  Development of the classification error rate on the data set of patient samples over the course of feature removal
Fig.5  Trend of the accuracy of the model discriminating E. faecalis from E. coli based on cross-validation
1 Stöckel S, Kirchhoff J, Neugebauer U, Rösch P, Popp J. The application of Raman spectroscopy for the detection and identification of microorganisms. Journal of Raman Spectroscopy : JRS, 2016, 47(1): 89–109
doi: 10.1002/jrs.4844
2 Meisel S, Stöckel S, Rösch P, Popp J. Identification of meat-associated pathogens via Raman microspectroscopy. Food Microbiology, 2014, 38: 36–43
doi: 10.1016/j.fm.2013.08.007
3 Rösch P, Harz M, Schmitt M, Peschke K D, Ronneberger O, Burkhardt H, Motzkus H W, Lankers M, Hofer S, Thiele H, Popp J. Chemotaxonomic identification of single bacteria by micro-Raman spectroscopy: application to clean-room-relevant biological contaminations. Applied and Environmental Microbiology, 2005, 71(3): 1626–1637
doi: 10.1128/AEM.71.3.1626-1637.2005 pmid: 15746368
4 Mukherjee S. Classifying Microarray Data Using Support Vector Machines in A Practical Approach to Microarray Data Analysis. Boston: Springer US, 2003, 166–185
5 Bocklitz T, Putsche M, Stüber C, Käs J, Niendorf A, Rösch P, Popp J. A comprehensive study of classification methods for medical diagnosis. Journal of Raman Spectroscopy: JRS, 2009, 40(12): 1759–1765 
doi: 10.1002/jrs.2529
6 Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1–2): 273–324 
doi: 10.1016/S0004-3702(97)00043-X
7 Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics (Oxford, England), 2007, 23(19): 2507–2517
doi: 10.1093/bioinformatics/btm344 pmid: 17720704
8 Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using Support Vector Machines. Machine Learning, 2002, 46(1/3): 389–422 
doi: 10.1023/A:1012487302797
9 Granitto P M, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 2006, 83(2): 83–90
doi: 10.1016/j.chemolab.2006.01.007
10 Menze B H, Kelm B M, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht F A. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics, 2009, 10(1): 213 
doi: 10.1186/1471-2105-10-213 pmid: 19591666
11 Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32 
doi: 10.1023/A:1010933404324
12 Toloşi L, Lengauer T. Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics (Oxford, England), 2011, 27(14): 1986–1994
doi: 10.1093/bioinformatics/btr300 pmid: 21576180
13 Cortes C, Vapnik V. Support-vector networks. Machine Learning, 1995, 20(3): 273–297
doi: 10.1007/BF00994018
14 Kloß S, Kampe B, Sachse S, Rösch P, Straube E, Pfister W, Kiehntopf M, Popp J. Culture independent Raman spectroscopic identification of urinary tract infection pathogens: a proof of principle study. Analytical Chemistry, 2013, 85(20): 9610–9616
doi: 10.1021/ac401806f pmid: 24010860
15 Morháč M, Kliman J, Matoušek V, Veselský M, Turzo I. Background elimination methods for multidimensional coincidence g-ray spectra. Nuclear Instruments & Methods in Physics Research Section A, Accelerators, Spectrometers, Detectors and Associated Equipment, 1997, 401(1): 113–132 
doi: 10.1016/S0168-9002(97)01023-1
16 Zhang D, Jallad K N, Ben-Amotz D. Stripping of cosmic spike spectral artifacts using a new upper-bound spectrum algorithm. Applied Spectroscopy, 2001, 55(11): 1523–1531
doi: 10.1366/0003702011953757
17 Dörfer T, Bocklitz T, Tarcea N, Schmitt M, Popp J. Checking and improving calibration of Raman spectra using chemometric approaches. Zeitschrift für Physikalische Chemie, 2011, 225(6–7): 753–764
doi: 10.1524/zpch.2011.0077
18 Boser B E, Guyon I M, Vapnik V N. A training algorithm for optimal margin classifiers. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory. New York: ACM, 1992, 144–152
19 Vapnik V. The Nature of Statistical Learning Theory. 2nd ed. New York: Springer Science & Business Media, 2013
20 Couvreur C, Bresler Y. On the optimality of the backward greedy algorithm for the subset selection problem. SIAM Journal on Matrix Analysis and Applications, 2000, 21(3): 797–808
doi: 10.1137/S0895479898332928
21 Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of Machine Learning Research, 2004, 5: 101–141
22 R Core Team. R: A language and environment for statistical computing, R Foundation for Statistical Computing, 2016
23 Karatzoglou A, Smola A, Hornik K, Zeileis A. kernlab – An S4 package for kernel methods in R. Journal of Statistical Software, 2004, 11(9): 1–20 
doi: 10.18637/jss.v011.i09
24 Van Campenhout J M. Topics in measurement selection. In: Handbook of Statistics. Elsevier, 1982, 793–803
25 Sima C, Dougherty E R. The peaking phenomenon in the presence of feature-selection. Pattern Recognition Letters, 2008, 29(11): 1667–1674
doi: 10.1016/j.patrec.2008.04.010
26 Witten D M, Tibshirani R. Penalized classification using Fisher’s linear discriminant. Journal of the Royal Statistical Society Series B, Statistical Methodology, 2011, 73(5): 753–772 
doi: 10.1111/j.1467-9868.2011.00783.x pmid: 22323898
27 Lavine B K, Davidson C E, Moores A J, Griffiths P R. Raman spectroscopy and genetic algorithms for the classification of wood types. Applied Spectroscopy, 2001, 55(8): 960–966
Related articles from Frontiers Journals
[1] Yuanyuan ZHOU,Hector F. GARCES,Nitin P. PADTURE. Challenges in the ambient Raman spectroscopy characterization of methylammonium lead triiodide perovskite thin films[J]. Front. Optoelectron., 2016, 9(1): 81-86.
[2] Yun-Qing CAO, Xin XU, Shu-Xin LI, Wei LI, Jun XU, Kunji CHEN. Improved photovoltaic properties of Si quantum dots/SiC multilayers-based heterojunction solar cells by reducing tunneling barrier thickness[J]. Front Optoelec, 2013, 6(2): 228-233.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed