Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation

Junxiao XUE; Shibo HUANG; Huawei SONG; Lei SHI

doi:10.1007/s11704-023-2230-x

Front. Comput. Sci. ›› 2023, Vol. 17 ›› Issue (6) :176344 DOI: 10.1007/s11704-023-2230-x

Artificial Intelligence

LETTER

Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation

Author information +

History +

PDF (851KB)

Graphical abstract

Cite this article

Download citation ▾

Junxiao XUE, Shibo HUANG, Huawei SONG, Lei SHI. Fine-grained sequence-to-sequence lip reading based on self-attention and self-distillation. Front. Comput. Sci., 2023, 17 (6) : 176344 DOI:10.1007/s11704-023-2230-x

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Xiao J, Yang S, Zhang Y, Shan S, Chen X . Deformation flow based two-stream network for lip reading. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020, 364–370

[2]	Assael Y M, Shillingford B, Whiteson S, De Freitas N . LipNet: End-to-end sentence-level lipreading. 2017, arXiv preprint arXiv: 1611, 0159, 9

[3]	Chung J S, Senior A, Vinyals O, . . Lip reading sentences in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 3444–3453

[4]	Xu K, Li D, Cassimatis N, Wang X . LCANet: End-to-end lipreading with cascaded attention-CTC. In: Proceedings of the 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018, 548–555

[5]	Zhang Y, Yang S, Xiao J, . . Can we read speech beyond the lips? rethinking roi selection for deep visual speech recognition. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020, 356–363

[6]	Luo M, Yang S, Shan S, Chen X . Pseudo-convolutional policy gradient for sequence-to-sequence lip-reading. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), 2020, 273–280

[7]	Zhang X, Cheng F, Wang S . Spatio-temporal fusion based convolutional sequence learning for lip reading. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision, 2019, 713–722