GANet: multi-modal adaptation continuous sign language recognition via gloss-aware network
Qi Chu , Shuang Xu , Yuehang Wang , Yongji Zhang , Qianren Guo , Hongde Qin , Yu Jiang
Complex Engineering Systems ›› 2026, Vol. 6 ›› Issue (2) -7.
Continuous sign language recognition (CSLR) aims to model the temporal evolution of visual gestures to recognize continuous semantic units, which is of great significance for applications in deaf communication assistance and intelligent human–computer interaction. While existing methods emphasize local segment modeling and long-range dependency capture, they often overlook the critical role of global semantic context in overall video comprehension—an oversight that contradicts the inherently context-dependent nature of sign language. Moreover, sign language videos frequently contain a large number of visually similar but semantically meaningless motions. These misleading segments are easily misperceived as valid glosses, thereby degrading recognition accuracy. To address these challenges, we propose GANet (Gloss-Aware Network), a novel CSLR framework with cross-modal input adaptability. Inspired by the hierarchical structure of "book–chapter–content", GANet explicitly models global context to guide local understanding while effectively suppressing irrelevant motion noise. Specifically, we introduce a Global Context Modeling Module to capture semantic patterns across frames and an auxiliary task to enhance the model's ability to learn high-level structural semantics. In addition, we propose a Gloss-Aware Module that leverages global semantics to model the spatiotemporal occurrence of glosses, thereby improving the recognition of meaningful gestures. Extensive experiments on multiple benchmark datasets demonstrate that GANet outperforms existing methods, validating its effectiveness, robustness, and broad adaptability to both RGB (red, green, and blue) and event-based data.
Continuous sign language recognition / event camera / multi-modal adaptation
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
/
| 〈 |
|
〉 |