GANet: multi-modal adaptation continuous sign language recognition via gloss-aware network

Qi Chu; Shuang Xu; Yuehang Wang; Yongji Zhang; Qianren Guo; Hongde Qin; Yu Jiang

doi:10.20517/ces.2025.79

Complex Engineering Systems ›› 2026, Vol. 6 ›› Issue (2) -7. DOI: 10.20517/ces.2025.79

Research Article

GANet: multi-modal adaptation continuous sign language recognition via gloss-aware network

Author information +

History +

PDF

Abstract

Continuous sign language recognition (CSLR) aims to model the temporal evolution of visual gestures to recognize continuous semantic units, which is of great significance for applications in deaf communication assistance and intelligent human–computer interaction. While existing methods emphasize local segment modeling and long-range dependency capture, they often overlook the critical role of global semantic context in overall video comprehension—an oversight that contradicts the inherently context-dependent nature of sign language. Moreover, sign language videos frequently contain a large number of visually similar but semantically meaningless motions. These misleading segments are easily misperceived as valid glosses, thereby degrading recognition accuracy. To address these challenges, we propose GANet (Gloss-Aware Network), a novel CSLR framework with cross-modal input adaptability. Inspired by the hierarchical structure of "book–chapter–content", GANet explicitly models global context to guide local understanding while effectively suppressing irrelevant motion noise. Specifically, we introduce a Global Context Modeling Module to capture semantic patterns across frames and an auxiliary task to enhance the model's ability to learn high-level structural semantics. In addition, we propose a Gloss-Aware Module that leverages global semantics to model the spatiotemporal occurrence of glosses, thereby improving the recognition of meaningful gestures. Extensive experiments on multiple benchmark datasets demonstrate that GANet outperforms existing methods, validating its effectiveness, robustness, and broad adaptability to both RGB (red, green, and blue) and event-based data.

Keywords

Continuous sign language recognition / event camera / multi-modal adaptation

Cite this article

Download citation ▾

Qi Chu, Shuang Xu, Yuehang Wang, Yongji Zhang, Qianren Guo, Hongde Qin, Yu Jiang. GANet: multi-modal adaptation continuous sign language recognition via gloss-aware network. Complex Engineering Systems, 2026, 6(2): -7 DOI:10.20517/ces.2025.79

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Ong S. C.,Ranganath S.. Automatic sign language analysis: a survey and the future beyond lexical meaning IEEE Trans. Pattern Anal. Mach. Intell. 2005 27 873 91

[2]	Rastgoo R.,Kiani K.,Escalera S.. Sign language recognition: a deep survey Expert Syst. Appl. 2021 164 113794

[3]	Sahoo L. K.,Varadarajan V.. Deep learning for autonomous driving systems: technological innovations, strategic implementations, and business implications - a comprehensive review Complex Eng. Syst. 2025 5 2

[4]	Wu X.,Han W.,Yang H.,Han H.,Qiao J.,Peng X.. Robust multivariable tracking control for biological wastewater treatment process with external disturbances and uncertainties Complex Eng. Syst. 2025 5 12

[5]	Sun C.,Zhang T.,Bao B. K.,Xu C.,Mei T.. Discriminative exemplar coding for sign language recognition with kinect IEEE Trans. Cybern. 2013 43 1418 28

[6]	Ren Y.,Li H.,Li Y..et al. Multi-modal isolated sign language recognition based on self-paced learning Expert Syst. Appl. 2025 291 128340

[7]	Gao L.,Zhu L.,Hu L.,Shi P.,Wan L.,Feng W.. A structure-based disentangled network with contrastive regularization for sign language recognition Expert Syst. Appl. 2025 271 126623

[8]	Brandli C.,Berner R.,Yang M.,Liu S. C.,Delbruck T.. A 240×180 130 db 3 μs latency global shutter spatiotemporal vision sensor IEEE J. Solid State Circuits 2014 49 2333 41

[9]	Jiang Y.,Wang Y.,Li S..et al. EvCSLR: Event-guided continuous sign language recognition and benchmark IEEE Trans. Multimedia 2025 27 1349 61

[10]	Cui R.,Liu H.,Zhang C.. A deep neural framework for continuous sign language recognition by iterative training IEEE Trans. Multimedia 2019 21 1880 91

[11]	Liu T.,Tao T.,Zhao Y.,Zhu J.. A two-stream sign language recognition network based on keyframe extraction method Expert Syst. Appl. 2024 253 124268

[12]	De Castro G. Z.,Guerra R. R.,Guimarães F. G.. Automatic translation of sign language with multi-stream 3D CNN and generation of artificial depth maps Expert Syst. Appl. 2023 215 119394

[13]	Lee C.,Ng K. K.,Chen C.,Lau H.,Chung S.,Tsoi T.. American sign language recognition and training method with recurrent neural network Expert Syst. Appl. 2021 167 114403

[14]	Gao W.,Fang G.,Zhao D.,Chen Y.. A Chinese sign language recognition system based on SOFM/SRN/HMM Pattern Recogn. 2004 37 2389 402

[15]	Zhang J.,Wang Q.,Wang Q.. A sign language recognition framework based on cross-modal complementary information fusion IEEE Trans. Multimedia 2024 26 8131 44

[16]	Kumar P.,Gauba H.,Pratim Roy P.,Prosad Dogra D.. A multimodal framework for sensor based sign language recognition Neurocomputing 2017 259 21 38

[17]	Oszust M.,Krupski J.. Isolated sign language recognition with depth cameras Procedia Comp. Sci. 2021 192 2085 94

[18]	Jiang Y.,Wang Y.,Li S.,Zhang Y.,Zhao M.,Gao Y.. Event-based low-illumination image enhancement IEEE Trans. Multimedia 2024 26 1920 31

[19]	Gao Y.,Li S.,Li Y.,Guo Y.,Dai Q.. SuperFast: 200× video frame interpolation via event camera IEEE Trans. Pattern Anal. Mach. Intell. 2023 45 7764 80

[20]	Rebecq H.,Ranftl R.,Koltun V.,Scaramuzza D.. High speed and high dynamic range video with an event camera IEEE Trans. Pattern Anal. Mach. Intell. 2019 43 1964 80

[21]	Chen H.,Teng M.,Shi B.,Wang Y.,Huang T.. A residual learning approach to deblur and generate high frame rate video with an event camera IEEE Trans. Multimedia 2022 25 5826 39

[22]	Li J.,Li J.,Zhu L.,Xiang X.,Huang T.,Tian Y.. Asynchronous spatio-temporal memory network for continuous event-based object detection IEEE Trans. Image Process. 2022 31 2975 87

[23]	Shi Q.,Ye Z.,Wang J.,Zhang Y.. QISampling: an effective sampling strategy for event-based sign language recognition IEEE Signal Process. Lett. 2023 30 768 72

[24]	Vasudevan A.,Negri P.,Di Ielsi C.,Linares-Barranco B.,Serrano-Gotarredona T.. SL-animals-DVS: event-driven sign language animals dataset Pattern Anal. Appl. 2022 25 505

[25]	Alyami S.,Luqman H.. Swin-MSTP: swin transformer with multi-scale temporal perception for continuous sign language recognition Neurocomputing 2025 617 129015

[26]	Zhang Y.,Xue W.,Zhou Y.,Yuan T.,Chen S.. CORE: multi-link graph attention network with inter-regional collaboration for continuous sign language recognition Pattern Recogn. 2025 167 111716