Improved speech absence probability estimation based on environmental noise classification

Young-ho Son; Sang-min Lee

doi:10.1007/s11771-012-1309-6

Journal of Central South University ›› 2012, Vol. 19 ›› Issue (9) :2548 -2553. DOI: 10.1007/s11771-012-1309-6

Article

Improved speech absence probability estimation based on environmental noise classification

Young-ho Son ¹
, Sang-min Lee ¹^,²^,^b

Author information +

History +

PDF

Abstract

An improved speech absence probability estimation was proposed using environmental noise classification for speech enhancement. A relevant noise estimation approach, known as the speech presence uncertainty tracking method, requires seeking the “a priori” probability of speech absence that is derived by applying microphone input signal and the noise signal based on the estimated value of the “a posteriori” signal-to-noise ratio (SNR). To overcome this problem, first, the optimal values in terms of the perceived speech quality of a variety of noise types are derived. Second, the estimated optimal values are assigned according to the determined noise type which is classified by a real-time noise classification algorithm based on the Gaussian mixture model (GMM). The proposed algorithm estimates the speech absence probability using a noise classification algorithm which is based on GMM to apply the optimal parameter of each noise type, unlike the conventional approach which uses a fixed threshold and smoothing parameter. The performance of the proposed method was evaluated by objective tests, such as the perceptual evaluation of speech quality (PESQ) and composite measure. Performance was then evaluated by a subjective test, namely, mean opinion scores (MOS) under various noise environments. The proposed method show better results than existing methods.

Keywords

speech enhancement / soft decision / speech absence probability / Gaussian mixture model (GMM)

Cite this article

Download citation ▾

Young-ho Son, Sang-min Lee. Improved speech absence probability estimation based on environmental noise classification. Journal of Central South University, 2012, 19(9): 2548-2553 DOI:10.1007/s11771-012-1309-6

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	EpharimY., MalahD.. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator [J]. IEEE Trans Acoust, Speech, Signal Process, 1985, ASSP-32(2): 443-445

[2]	SohnJ., KimN. S., SungW.. A statistical model-based voice activity detection [J]. IEEE Signal Processing Letters, 1999, 6(1): 1-3

[3]	BollS. F.. Suppression of acoustic noise in speech using spectral subtraction [J]. IEEE Trans Acoust, Speech, Signal Process, 1979, ASSP-27(2): 113-120

[4]	LimJ. S., OppenheimA. V.. Enhancement and bandwidth compression of noisy speech [J]. IEEE Trans Acoust, Speech, Signal Process, 1979, ASSP-67(12): 1583-1604

[5]	GOMEZ R, KAWAHARA T. Optimizing spectral subtraction and wiener filtering for robust speech recognition in reverberant and noisy conditions [C]// Proc ICASSP. Dallax, TX, USA, 2010: 4566–4569.

[6]	McaualyR. J., MalpassM. L.. Speech enhancement using a soft-decision noise suppression filter [J]. IEEE Trans Acoust, Speech, Signal Processing, 1980, 28(2): 137-145

[7]	SCALART P, FILHO J W. Speech enhancement based on a priori signal to noise estimation [C]. Proc ICASSP. Atlanta, GA, USA, 1996: 629–632.

[8]	KimN. S., ChangJ. H.. Spectral enhancement based on global soft decision [J]. IEEE Signal Processing Letters, 2000, 7(5): 108-110

[9]	EphraimY., MalahD.. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator [J]. IEEE Trans Acoust, Speech, Signal Process, 1984, ASSP-32(6): 1109-1121

[10]	MALAH Y D, CPX R, ACCARDI A. Tracking speech presence uncertainty to improve speech enhancement in non-stationary noise environments [C]// Proc IEEE Int Conf Acoustics Speech and Signal Processing. Phoenix, AZ, USA, 1999: 789–792.

[11]	XUAN G, ZHANG W, CHAI P. EM algorithm of Gaussian mixture model and hidden Markov model [C]. Proc IEEE International Conference on Image Processing. Thessaloniki, 2001: 145–148.

[12]	ReynoldsD. A., QuatieriT. F., DunnR. B.. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000, 10: 19-41

[13]	SeokhwanJ., ChangD. Yoo.. Psychoacoustically constrained and distortion minimized speech enhancement [J]. IEEE Transactions on Audio Speech and Language Processing, 2010, 18(8): 2099-2110

[14]	ITU-T P.862. Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs [R]. 2001.

[15]	HuY., LoizouP. C.. Evaluation of objective quality measures for speech enhancement [J]. IEEE Transactions on Audio, Speech and Language Processing, 2008, 16(1): 229-238