A robust feature extraction approach based on an auditory model for classification of speech and expressiveness

Ying Sun , V. Werner , Xue-ying Zhang

Journal of Central South University ›› 2012, Vol. 19 ›› Issue (2) : 504 -510.

PDF
Journal of Central South University ›› 2012, Vol. 19 ›› Issue (2) : 504 -510. DOI: 10.1007/s11771-012-1032-3
Article

A robust feature extraction approach based on an auditory model for classification of speech and expressiveness

Author information +
History +
PDF

Abstract

Based on an auditory model, the zero-crossings with maximal Teager energy operator (ZCMT) feature extraction approach was described, and then applied to speech and emotion recognition. Three kinds of experiments were carried out. The first kind consists of isolated word recognition experiments in neutral (non-emotional) speech. The results show that the ZCMT approach effectively improves the recognition accuracy by 3.47% in average compared with the Teager energy operator (TEO). Thus, ZCMT feature can be considered as a noise-robust feature for speech recognition. The second kind consists of mono-lingual emotion recognition experiments by using the Taiyuan University of Technology (TYUT) and the Berlin databases. As the average recognition rate of ZCMT approach is 82.19%, the results indicate that the ZCMT features can characterize speech emotions in an effective way. The third kind consists of cross-lingual experiments with three languages. As the accuracy of ZCMT approach only reduced by 1.45%, the results indicate that the ZCMT features can characterize emotions in a language independent way.

Keywords

speech recognition / emotion recognition / zero-crossings / Teager energy operator / speech database

Cite this article

Download citation ▾
Ying Sun, V. Werner, Xue-ying Zhang. A robust feature extraction approach based on an auditory model for classification of speech and expressiveness. Journal of Central South University, 2012, 19(2): 504-510 DOI:10.1007/s11771-012-1032-3

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

MengQ.-m., WuW.-guo.. Artificial emotional model based on finite state machine [J]. Journal of Central South University of Technology, 2008, 15(5): 694-699

[2]

DevillersL., VaudableC., ChastagnolC.. Real-life emotion-related states detection in call centers: A cross-corpora study [C]. Proceedings of 11th Annual Conference of the International Speech Communication Association, 2010, Chiba Japan, ISCA: 2350-2353

[3]

BatlinerA., SteidlS., SchullerB., SeppiD., VogtT., WagnerJ., DevillersL., VidrascuL., AharonsonV., KessousL., AmirN.. Whodunnit-Searching for the most important feature types signalling emotion-related user states in speech [J]. Computer Speech and Language, 2011, 25(1): 4-28

[4]

VerveridisD., KotropoulosC.. Emotional speech recognition: Resources, features, and methods [J]. Speech Communication, 2006, 48(9): 1162-1181

[5]

YangB., LuggerM.. Emotion recognition from speech signals using new harmony features [J]. Signal Processing, 2010, 90(5): 1415-1423

[6]

LuggerM., YangB.. On the relevant of high-level features for speaker independent emotion recognition of spontaneous speech [C]. Proceedings of 10th Annual Conference of the International Speech Communication Association, 2009, Brighton, United Kingdom, ISCA: 1995-1998

[7]

BoudraaA., BenramdaneS., CexusJ., ChonavelT.. Some useful properties of cross-ψB-energy operator [J]. AEU-International Journal of Electronics and Communications, 2009, 63(9): 728-735

[8]

KimD., LeeS., KilR.. Auditory processing of speech signal for robust speech recognition in real-world noisy environments [J]. IEEE Transactions Speech and Audio Processing, 1999, 7(1): 55-58

[9]

RAMACHANDRAN R P, MAMMONE R J. Modern methods of speech processing [M]. Dordrecht: Kluwer Academic Publishers, 1994.

[10]

FillonT., PradoJ.. Evaluation of an ERB frequency scale noise reduction for hearing aids: A comparative study [J]. Speech Communication, 2003, 39(1/2): 23-32

[11]

JiaoZ.-ping.Research on improved ZCPA speech recognition feature extraction algorithm [D], 2005, Taiyuan, College of Information Engineering, Taiyuan University of Technology

[12]

KaiserJ. F.. On a simple algorithm to calculate the ‘energy’ of a signal [C]. 1990 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1990, Albuquerque, IEEE: 381-384

[13]

SalzensteinF., BoudraaA. O.. Multi-dimensional higher order differential operators derived from the Teager-Kaiser energy-tracking function [J]. Signal Processing, 2009, 89(4): 623-640

[14]

HaqueS., TogneriR., ZaknichA.. Perceptual features for automatic speech recognition in noisy environments [J]. Speech Communication, 2009, 51(1): 58-75

[15]

KimD. S., LeeS. Y., KilR. M.. Auditory processing of speech signal for robust speech recognition in real-world noisy environments [J]. IEEE Transactions Speech and Audio Processing, 1999, 7(1): 55-58

[16]

HeL., LechM., MaddageN., AllenN., ZhouH.. Emotion recognition in speech of parents of depressed adolescents [C]. Proceedings of 3rd International Conference on Bioinformation and Biomedical Engineering, 2009, Redhook, IEEE: 1-4

[17]

ChenS.. Orthogonal least squares learning algorithm for radial basis function networks [J]. IEEE Transcations on Neutral Network, 1991, 2(2): 302-309

[18]

CampbellW. M., CampbellJ. P., ReynoldsD. A., SingerE., Torres-carrasquilloP. A.. Support vector machines for speaker and language recognition [J]. Computer Speech and Language, 2006, 20(2/3): 210-229

[19]

ZhangH.-l., ZouZ., LiJ., ChenX.-tao.. Flame image recognition of alumina rotary kiln by artificial neural network and support vector machine methods [J]. Journal of Central South University of Technology, 2008, 15(1): 39-43

[20]

SunY., ZhangX.-ying.. Realization of improved HMM-based speech synthesis system [C]. 2010 International Symposium on Computer, Communication, Control and Automation, 2010, Tainan, IEEE: 354-357

[21]

SunY., ZhangX.-ying.PanJ.-s., SunS.-h., ShiechC. S.. A study of zero-crossings with peak-amplitudes in speech emotion classification [C]. The First International Conference on Pervasive Computing, Signal Processing and Applications, 2010, Harbin, IEEE: 328-331

[22]

BurkhardtF., PaeschkeA., RolfesM., SendlmeierW., WeissB.. A database of German emotional speech [C]. Proceedings of 6th Annual Conference of the International Speech Communication Association, 2005, Lisben Portngal, ISCA: 1517-1520

AI Summary AI Mindmap
PDF

114

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/