Shared-weight multimodal translation model for recognizing Chinese variant characters
Yuankang SUN , Bing LI , Lexiang LI , Peng YANG , Dongmei YANG
Front. Inform. Technol. Electron. Eng ›› 2025, Vol. 26 ›› Issue (7) : 1066 -1082.
Shared-weight multimodal translation model for recognizing Chinese variant characters
The task of recognizing Chinese variant characters aims to address the challenges of semantic ambiguity and confusion, which potentially cause risks to the security of Web content and complicate the governance of sensitive words. Most existing approaches predominantly prioritize the acquisition of contextual knowledge from Chinese corpora and vocabularies during pretraining, often overlooking the inherent phonological and morphological characteristics of the Chinese language. To address these issues, we propose a shared-weight multimodal translation model (SMTM) based on multimodal information of Chinese characters, which integrates the phonology of Pinyin and the morphology of fonts into each Chinese character token to learn the deeper semantics of variant text. Specifically, we encode the Pinyin features of Chinese characters using the embedding layer, and the font features of Chinese characters are extracted based on convolutional neural networks directly. Considering the multimodal similarity between the source and target sentences of the Chinese variant-character-recognition task, we design the shared-weight embedding mechanism to generate target sentences using the heuristic information from the source sentences in the training process. The simulation results show that our proposed SMTM achieves remarkable performance of 89.550% and 79.480% on bilingual evaluation understudy (BLEU) and F1 metrics respectively, with significant improvement compared with state-of-the-art baseline models.
Chinese variant characters / Multimodal model / Translation model / Phonology and morphology
Zhejiang University Press
/
| 〈 |
|
〉 |