Mapping methods for output-based objective speech quality assessment using data mining

Jing Wang , Sheng-hui Zhao , Xiang Xie , Jing-ming Kuang

Journal of Central South University ›› 2014, Vol. 21 ›› Issue (5) : 1919 -1926.

PDF
Journal of Central South University ›› 2014, Vol. 21 ›› Issue (5) : 1919 -1926. DOI: 10.1007/s11771-014-2138-6
Article

Mapping methods for output-based objective speech quality assessment using data mining

Author information +
History +
PDF

Abstract

Objective speech quality is difficult to be measured without the input reference speech. Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm. The degraded speech is firstly separated into three classes (unvoiced, voiced and silence), and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining. Fuzzy Gaussian mixture model (GMM) is used to generate the artificial reference model trained on perceptual linear predictive (PLP) features. The mean opinion score (MOS) mapping methods including multivariate non-linear regression (MNLR), fuzzy neural network (FNN) and support vector regression (SVR) are designed and compared with the standard ITU-T P.563 method. Experimental results show that the assessment methods with data mining perform better than ITU-T P.563. Moreover, FNN and SVR are more efficient than MNLR, and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.

Keywords

objective speech quality / data mining / multivariate non-linear regression / fuzzy neural network / support vector regression

Cite this article

Download citation ▾
Jing Wang, Sheng-hui Zhao, Xiang Xie, Jing-ming Kuang. Mapping methods for output-based objective speech quality assessment using data mining. Journal of Central South University, 2014, 21(5): 1919-1926 DOI:10.1007/s11771-014-2138-6

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

ITU-T Rec.. P.800. Methods for subjective determination of transmission quality [S]. International Telecommunication Union, Geneva, Switzerland, 1996

[2]

ITU-T Rec.. P.830. Subjective performance assessment of telephone-band and wideband digital codecs [S]. International Telecommunication Union, Geneva, Switzerland, 1996

[3]

ITU-T Rec.. P.862. Perceptual Evaluation of Speech Quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs [S]. International Telecommunication Union, Geneva, Switzerland, 2001

[4]

ITU-T Rec.. P.563. Single ended method for objective speech quality assessment in narrow-band telephony applications [S]. International Telecommunication Union, Geneva, Switzerland, 2004

[5]

AbarehiM. Improved ITU-P.563 non-intrusive speech quality assessment method for covering VOIP conditions [C]. Proc of 10th International Conference on Advanced Communication Technology, 2008, Gangwon-Do, IEEE: 354-357

[6]

FalkT H, ChanW Y, KabalP. Speech quality estimation using Gaussian mixture models [C]. Proc of the Int Conf on Spoken Language Processing, 2004, Toulouse, IEEE: 2013-2016

[7]

FalkT H, XuQ, ChanW Y. Non-intrusive GMM-based speech quality measurement [C]. Proc of the Int Conf on Acoustics, Speech, Signal Processing, 2005, Philadelphia, IEEE: 125-128

[8]

SonY H, LeeS M. Improved speech absence probability estimation based on environmental noise classification [J]. Journal of Central South University, 2012, 19(9): 2548-2553

[9]

MossavatS I, AmftO, VriesB, PetkovN, KleijnW B. A Bayesian hierarchical mixture of experts approach to estimate speech quality [C]. Proc of the 2nd International Workshop on Quality of Multimedia Experience, 2010, Trondheim, IEEE: 200-205

[10]

NarwariaM, WeisiL, McloughlinI V, EmmanuelS, LiangT C. Nonintrusive quality assessment of noise suppressed speech with Mel-filtered energies and support vector regression [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4): 1217-1232

[11]

JoC W, KimJ H. Segregation of voiced and unvoiced components from residual of speech signal [J]. Journal of Central South University, 2012, 19(2): 496-503

[12]

3GPP TS 26.090.. Adaptive Multi-Rate (AMR) speech codec [S]. The 3rd Generation Partnership Project, France, 2002

[13]

GamlielO, ShallomI D. Perceptual time varying linear prediction model for speech applications [C]. Proc of the Int. Conf. on Acoustics, Speech, and Signal Processing, 2009, Taipei, IEEE: 4601-4604

[14]

HermanskyH, MorganN, BayyaA, KohnP. RASTA-PLP speech analysis technique [C]. Proc of the Int Conf on Acoustics, Speech, and Signal Processing, 1992, San Francisco, IEEE: 121-124

[15]

LinL, ChenJ, SunX-ying. A new Gaussian mixture model optimization method [C]. Proc of International Conference on Electrical and Control Engineering, 2010, Wuhan, IEEE: 137-140

[16]

YuF-h, XuH-k, WangL-m, ZhouX-jian. An improved automatic FCM clustering algorithm [C]. Proc of 2nd International Workshop on Database Technology and Applications, 2010, Wuhan, IEEE: 1-4

[17]

HussianS, AzahM, SaifunizamA K, MohdW M. A method for real power transfer allocation using multivariable regression analysis [J]. Journal of Central South University, 2012, 19(1): 179-186

[18]

KennethO C. Nonlinear multiple regression methods: A survey and extension [J]. Intell Sys Acc Fin Mgmt, 2010, 17(1): 19-39

[19]

JohnFNonlinear regression and nonlinear least squares-appendix to an R and S-PLUS companion to applied regression [M], 2002Second EditionNew York, Sage Publications Inc: 1-5

[20]

AcamporaG. A TSK neuro-fuzzy approach for modeling highly dynamic systems [C]. Proc of International Conference on Fuzzy Systems, 2011, Taipei, IEEE: 146-152

[21]

HaoP Y, ChiangJ H. Fuzzy regression analysis by support vector learning approach [J]. IEEE Transactions on Fuzzy Systems, 2008, 16(2): 428-441

[22]

YanX-t, WuM-q, SunBing. An adaptive LS-SVM based differential evolution algorithm [C]. Proc of International Conference on Signal Processing Systems, 2009, Singapore, IEEE: 406-409

[23]

SunY, WemerV, ZhangX-ying. A robust feature extraction approach based on an auditory model for classification of speech and expressiveness [J]. Journal of Central South University, 2012, 19(2): 504-510

[24]

HuangL-x, EvangelistaG, ZhangX-ying. Adaptive bands filter bank optimized by genetic algorithm for robust speech recognition system [J]. Journal of Central South University of Technology, 2011, 18(5): 1595-1601

AI Summary AI Mindmap
PDF

110

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/