A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization

V. Subba Ramaiah , R. Rajeswara Rao

Journal of Central South University ›› 2017, Vol. 24 ›› Issue (11) : 2649 -2663.

PDF
Journal of Central South University ›› 2017, Vol. 24 ›› Issue (11) : 2649 -2663. DOI: 10.1007/s11771-017-3678-3
Article

A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization

Author information +
History +
PDF

Abstract

In audio stream containing multiple speakers, speaker diarization aids in ascertaining “who speak when”. This is an unsupervised task as there is no prior information about the speakers. It labels the speech signal conforming to the identity of the speaker, namely, input audio stream is partitioned into homogeneous segments. In this work, we present a novel speaker diarization system using the Tangent weighted Mel frequency cepstral coefficient (TMFCC) as the feature parameter and Lion algorithm for the clustering of the voice activity detected audio streams into particular speaker groups. Thus the two main tasks of the speaker indexing, i.e., speaker segmentation and speaker clustering, are improved. The TMFCC makes use of the low energy frame as well as the high energy frame with more effect, improving the performance of the proposed system. The experiments using the audio signal from the ELSDSR corpus datasets having three speakers, four speakers and five speakers are analyzed for the proposed system. The evaluation of the proposed speaker diarization system based on the tracking distance, tracking time as the evaluation metrics is done and the experimental results show that the speaker diarization system with the TMFCC parameterization and Lion based clustering is found to be superior over existing diarization systems with 95% tracking accuracy.

Keywords

speaker diarization / Mel frequency cepstral coefficient / i-vector extraction / Lion algorithm

Cite this article

Download citation ▾
V. Subba Ramaiah, R. Rajeswara Rao. A novel approach for speaker diarization system using TMFCC parameterization and Lion optimization. Journal of Central South University, 2017, 24(11): 2649-2663 DOI:10.1007/s11771-017-3678-3

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

MoattarM H, HomayounpourM M. A reveiw on speaker diarization systems and approaches [J]. Speech Communications, 201254

[2]

TranterS E, DouglasA. Reynolds, an overview of automatic speaker diarization systems [J]. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(5): 1557-1565

[3]

KennyP, GuptaV, StafylakisT, OuelletP, AlamJDeep neural networks for extracting baum welch statistics for speaker recognition [C]//Proceedings of the Speaker and Language Recognition, 2014293-298

[4]

SayoudH, OuamourS, KhennoufSVirtual system of speaker tracking by camera using an audio-based source localization [C]//Proceedings of World Congress on Engineering, 20122

[5]

HuangY, BenestyJ, ElkoG WMicro phone arrays for video camera steering [M]//Acoustic Signal Processing for Telecommunication, 2000, Hingham, MA, USA, Kluwer Academic Publishers: 239-260

[6]

ChenJ-f, LouisS, WeeS. A new approach for speaker tracking in reverberant environment [J]. Signal Processing, 2002, 82: 1023-1028

[7]

HuM, SharmaD, DocloS, BrookesM, NaylorP ASpeaker Change detection and speaker diarization using spatial information [C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2015, Brisbane, QLD, Australia, IEEE: 5743-5747

[8]

MoattarM H, HomayounpourM M. Variational conditional random fields for online speaker detection and tracking [J]. Speech Communication, 2012, 54: 763-780

[9]

SunX, FooteJ, KimberD, ManjunathB S. Region of interest extraction and virtual camera control based on panoramic video [J]. IEEE Transactions on Multimedia, 2005, 7(5): 981-990

[10]

ChenY-q, RuiY. Real-time speaker tracking using particle filter sensor fusion [J]. Proceedings of the IEEE, 2004, 92(3): 485-494

[11]

SwamyR K, RamaM K, YegnanarayanaB. Determining number of speakers from multi-speaker speech signals using excitation source information [J]. IEEE Signal Processing Letters, 2007, 14(7): 481-484

[12]

PertilaP. Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking [J]. Computer Speech and Language, 2013, 27: 683-702

[13]

MaZ-h, YangY, GeQ, DengL-j, XuZ-x, SunX-na. Nonlinear filtering method of zero-order term suppression for improving the image quality in off-axis holography [J]. Optics Communications, 2014, 315: 232-237

[14]

YeT, ChenZ, YinF-liang. Distributed Kalman filter-based speaker tracking in microphone array networks [J]. Applied Acoustics, 2015, 89: 71-77

[15]

RajakumarB. The Lion’s algorithm: A new nature-inspired search algorithm [J]. Procedia Technology, 2012, 6: 126-135

[16]

DunnR B, ReynoldsD A, QuatieriT F. Approaches to speaker detection and tracking in conversational speech [J]. Digital Signal Processing, 2000, 10: 93-112

[17]

DaiX-f, LahdesmakiH, Yli-HarjaOA stratified beta-gaussian mixture model for clustering genes with multiple data sources [C]//Proceedings of Biocomputation, Bioinformatics, and Biomedical Technologies, 2008, Bucharest, Romania, IEEE: 94-99

[18]

MarkovicI, PetrovicI. Speaker localization and tracking with a microphone array on a mobile robot using von Mises distribution and particle filtering [J]. Robotics and Autonomous Systems, 2010, 58: 1185-1196

[19]

YegnanarayanaB, Mahadeva PrasannaS RAnalysis of instantaneous F0 contours from two speakers mixed signal using zero frequency filtering [C]//Proceedings of Acoustics Speech and Signal Processing, 2010, Dallas, TX, USA, IEEE: 5074-5077

[20]

AlamM J, OuelletP, KennyP, O’ShaughnessyD. Comparative evaluation of feature normalization techniques for speaker verification [J]. Advances in Nonlinear Speech Processing, 2011, 7015: 246-253

[21]

KumarK, KimC, SternR MDelta-spectral cepstral Coefficients for robust speech recognition [C]//Proceedings of ICASSP, 2011, Prague, Czech, IEEE: 4784-4787

[22]

GuptaV, BoulianneG, KennyP, OuelletP, DumouchelPSpeaker diarization of the French broadcast news [C]//Proceedings of ICASSP, 2008, Las Vegas, NV, USA, IEEE: 4365-4368

[23]

BarrasC, ZhuXuan, MeignierS, GauvainJ L. Multistage speaker diarization of broadcast news [J]. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(5): 1505-1512

[24]

MiroX A, BozonnetS, EvansN, FredouilleC, FriedlandG, VinyalsO, DiarizationS. Speaker diarization: A review of recent research [J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 202356-370

[25]

CampbellW M, SturimD E, ReynoldsD A. Support vector machines using GMM supervectors for speaker verification [J]. IEEE Signal Processing Letters, 2006, 13(5): 308-311

[26]

PeelingP, CemgilA T, GodsillSBayesian hierarchical models and inference for musical audio processing [C]//Proceedings of IEEE Wireless Pervasive Computing, 2008, Las Vegas, NV, USA, IEEE: 278-282

[27]

ZhengR, ZhangC, ZhangS-s, XuBoVariational bayes based i-vector for speaker diarization of telephone Conversations [C]//Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 201491-95

[28]

KennyP, GuptaV, StafylakisT, OuelletP, AlamJDeep neural networks for Baum-Welch statistics for speaker Recognition [C]//Proceedings of Neural Networks for Speaker and Language Modelling, 2014

[29]

FLSDSR corpus dataset. [2016–05–02]. http://cogsys.compute.dtu. dk/soundshare/elsdsr.zip.

AI Summary AI Mindmap
PDF

98

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/