Binary neural networks for speech recognition

Yan-min QIAN , Xu XIANG

Front. Inform. Technol. Electron. Eng ›› 2019, Vol. 20 ›› Issue (5) : 701 -715.

PDF (505KB)
Front. Inform. Technol. Electron. Eng ›› 2019, Vol. 20 ›› Issue (5) : 701 -715. DOI: 10.1631/FITEE.1800469
Orginal Article
Orginal Article

Binary neural networks for speech recognition

Author information +
History +
PDF (505KB)

Abstract

Recently, deep neural networks (DNNs) significantly outperform Gaussian mixture models in acoustic modeling for speech recognition. However, the substantial increase in computational load during the inference stage makes deep models difficult to directly deploy on low-power embedded devices. To alleviate this issue, structure sparseness and low precision fixed-point quantization have been applied widely. In this work, binary neural networks for speech recognition are developed to reduce the computational cost during the inference stage. A fast implementation of binary matrix multiplication is introduced. On modern central processing unit (CPU) and graphics processing unit (GPU) architectures, a 5–7 times speedup compared with full precision floatingpoint matrix multiplication can be achieved in real applications. Several kinds of binary neural networks and related model optimization algorithms are developed for large vocabulary continuous speech recognition acoustic modeling. In addition, to improve the accuracy of binary models, knowledge distillation from the normal full precision floating-point model to the compressed binary model is explored. Experiments on the standard Switchboard speech recognition task show that the proposed binary neural networks can deliver 3–4 times speedup over the normal full precision deep models. With the knowledge distillation from the normal floating-point models, the binary DNNs or binary convolutional neural networks (CNNs) can restrict the word error rate (WER) degradation to within 15.0%, compared to the normal full precision floating-point DNNs or CNNs, respectively. Particularly for the binary CNN with binarization only on the convolutional layers, the WER degradation is very small and is almost negligible with the proposed approach.

Keywords

Speech recognition / Binary neural networks / Binary matrix multiplication / Knowledge distillation / Population count

Cite this article

Download citation ▾
Yan-min QIAN, Xu XIANG. Binary neural networks for speech recognition. Front. Inform. Technol. Electron. Eng, 2019, 20(5): 701-715 DOI:10.1631/FITEE.1800469

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (505KB)

Supplementary files

FITEE-0701-19007-YMQ_suppl_1

FITEE-0701-19007-YMQ_suppl_2

4186

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/