Audio-guided self-supervised learning for disentangled visual speech representations
Dalu FENG, Shuang YANG, Shiguang SHAN, Xilin CHEN
Audio-guided self-supervised learning for disentangled visual speech representations
[1] |
Shi B, Hsu W N, Lakhotia K, Mohamed A. Learning audio-visual speech representation by masked multimodal cluster prediction. In: Proceedings of the 10th International Conference on Learning Representations. 2022
|
[2] |
Hsu W N, Shi B. u-HuBERT: unified mixed-modal speech pretraining and zero-shot transfer to unlabeled modality. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1538
|
[3] |
Stafylakis T, Tzimiropoulos G. Combining residual networks with LSTMs for lipreading. In: Proceedings of the 18th Annual Conference of the International Speech Communication Association. 2017, 3652−3656
|
[4] |
Ma P, Martinez B, Petridis S, Pantic M. Towards practical lipreading with distilled and efficient models. In: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 7608−7612
|
[5] |
Ma P, Wang Y, Shen J, Petridis S, Pantic M. Lip-reading with densely connected temporal convolutional networks. In: Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. 2021, 2856−2865
|
[6] |
Koumparoulis A, Potamianos G. Accurate and resource-efficient lipreading with efficientnetv2 and transformers. In: Proceedings of 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. 2022, 8467−8471
|
[7] |
Ma P, Petridis S, Pantic M. End-to-end audio-visual speech recognition with conformers. In: Proceedings of 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. 2021, 7613−7617
|
[8] |
Ma P, Petridis S, Pantic M . Visual speech recognition for multiple languages in the wild. Nature Machine Intelligence, 2022, 4( 11): 930–939
|
[9] |
Ma P, Haliassos A, Fernandez-Lopez A, Chen H, Petridis S, Pantic M. Auto-AVSR: audio-visual speech recognition with automatic labels. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2023, 1−5
|
[10] |
Yang Y, Zhuang Y, Pan Y . Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies. Frontiers of Information Technology & Electronic Engineering, 2021, 22( 12): 1551–1558
|
/
〈 | 〉 |