Voice activity detection based on deep belief networks using likelihood ratio

Sang-Kyun Kim , Young-Jin Park , Sangmin Lee

Journal of Central South University ›› 2016, Vol. 23 ›› Issue (1) : 145 -149.

PDF
Journal of Central South University ›› 2016, Vol. 23 ›› Issue (1) : 145 -149. DOI: 10.1007/s11771-016-3057-5
Article

Voice activity detection based on deep belief networks using likelihood ratio

Author information +
History +
PDF

Abstract

A novel technique is proposed to improve the performance of voice activity detection (VAD) by using deep belief networks (DBN) with a likelihood ratio (LR). The likelihood ratio is derived from the speech and noise spectral components that are assumed to follow the Gaussian probability density function (PDF). The proposed algorithm employs DBN learning in order to classify voice activity by using the input signal to calculate the likelihood ratio. Experiments show that the proposed algorithm yields improved results in various noise environments, compared to the conventional VAD algorithms. Furthermore, the DBN based algorithm decreases the detection probability of error with [0.7, 2.6] compared to the support vector machine based algorithm.

Keywords

voice activity detection / likelihood ratio / deep belief networks

Cite this article

Download citation ▾
Sang-Kyun Kim, Young-Jin Park, Sangmin Lee. Voice activity detection based on deep belief networks using likelihood ratio. Journal of Central South University, 2016, 23(1): 145-149 DOI:10.1007/s11771-016-3057-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

MakM W, YuH B. A study of voice activity detection techniques for NIST speaker recognition evaluations [J]. Computer Speech & Language, 2014, 28(1): 295-313

[2]

ParkY S, LeeS M. Voice activity detection using global speech absence probability based on teager energy for speech enhancement [J]. IEICE Trans Inf & Syst, 2012, E95-D(10): 2568-2571

[3]

KimS K, ChangJ H. Voice activity detection based on conditional MAP criterion incorporating the spectral gradient [J]. Signal Processing, 2012, 92(7): 1699-1705

[4]

KimY S, SongJ H, KimS K, LeeS M. Variable step-size affine projection algorithm based on GSAP for adaptive feedback cancellation [J]. Journal of Central South University, 2014, 21(2): 646-650

[5]

KwonK S, ShinJ W, SonowatS, ChoiI K, KimN S. Speech enhancement combining statistical models and NMF with update of speech and noise bases [C]//. Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP), 20147053-7057

[6]

RabinerL R, SamburM R. Voiced-unvoiced silence detection using Itakura LPC distance measure [C]//. Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP), 1977323-326

[7]

SohnJ, KimN S, SungW. A statistical model-based voice activity detection [J]. IEEE Signal Processing Letters, 1999, 6(1): 1-3

[8]

EphraimY, MalahD. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator [J]. IEEE Trans Acoustic Speech Signal Processing, 1984, 32(6): 1109-1121

[9]

ShinJ W, KwonH J, JinS H, KimN S. Voice activity detection based on conditional MAP criterion [J]. IEEE Signal Processing Letters, 2008, 15: 257-260

[10]

JoQ H, ParkY S, LeeK H, ChangJ H. A support vector machine-based voice activity detection employing effective feature vectors [J]. IEICE Trans Commun, 2008, E91-B(6): 2090-2093

[11]

QiZ, TianY, ShiY. Robust twin support vector machine for pattern classification [J]. Pattern Recognition, 2013, 46(1): 305-316

[12]

ZhangX-L, WuJ. Deep belief networks based voice activity detection [J]. IEEE Trans ASLP, 2013, 21(4): 697-710

[13]

ZhangX-L, WuJ. Denoising deep neural networks based voice activity detection [C]//. Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP), 2013

[14]

HughesT, MierkeK. Recurrent neural networks for voice activity detection [C]//. Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP), 2013

[15]

BengioY, LecunY. Scaling learning algorithms towards AI [J]. Large-scale Kernel Machines, 2007, 34(5): 321-360

[16]

HintonG. A practical guide to training restricted Boltzmann machines [M]//. Neural Networks: Tricks of the Trade, 2012SpringerBerlin Heidelberg599-619

[17]

SeltzerM L, YuD, WangY. An investigation of deep neural networks for noise robust speech recognition [C]//. Acoustics, Speech, and Signal Processing, IEEE International Conference on (ICASSP), 20137398-7402

[18]

HintonG. Training products of experts by minimizing contrastive divergence [J]. Neural Computation, 2002, 18(7): 1527-1554

[19]

ITU-T.Appendix III: G.729 Annex B enhancement in voice-over-IP applications-Option 2 [R], 2005

AI Summary AI Mindmap
PDF

81

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/