Sign language data quality improvement based on dual information streams

Jialiang Cai , Tiantian Yuan

Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (6) : 342 -347.

PDF
Optoelectronics Letters ›› 2025, Vol. 21 ›› Issue (6) : 342 -347. DOI: 10.1007/s11801-025-3137-6
Article

Sign language data quality improvement based on dual information streams

Author information +
History +
PDF

Abstract

Sign language dataset is essential in sign language recognition and translation (SLRT). Current public sign language datasets are small and lack diversity, which does not meet the practical application requirements for SLRT. However, making a large-scale and diverse sign language dataset is difficult as sign language data on the Internet is scarce. In making a large-scale and diverse sign language dataset, some sign language data qualities are not up to standard. This paper proposes a two information streams transformer (TIST) model to judge whether the quality of sign language data is qualified. To verify that TIST effectively improves sign language recognition (SLR), we make two datasets, the screened dataset and the unscreened dataset. In this experiment, this paper uses visual alignment constraint (VAC) as the baseline model. The experimental results show that the screened dataset can achieve better word error rate (WER) than the unscreened dataset.

Cite this article

Download citation ▾
Jialiang Cai, Tiantian Yuan. Sign language data quality improvement based on dual information streams. Optoelectronics Letters, 2025, 21(6): 342-347 DOI:10.1007/s11801-025-3137-6

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

MinY, HaoA, ChaiX, et al.. Visual alignment constraint for continuous sign language recognition. 2021 IEEE/CVF International Conference on Computer Vision, October 10–17, 2021, Montreal, QC, Canada, 2021, New York, IEEE: 11542-11551[C]

[2]

ChengK L, YangZ, ChenQ, et al.. Fully convolutional networks for continuous sign language recognition. European Conference on Computer Vision Computer, August 23–28, 2020, Glasgow, UK, 2020, Heidelberg, Springer: 697-714[C]

[3]

NiuZ, MakB. Stochastic fine-grained labeling of multi-state sign glosses for continuous sign language recognition. European Conference on Computer Vision Computer, August 23–28, 2020, Glasgow, UK, 2020, Heidelberg, Springer: 172-186[C]

[4]

KollerO, ForsterJ, NeyH. Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Computer vision and image understanding, 2015, 141: 108-125 J]

[5]

ZhouH, ZhouW, QiW, et al.. Improving sign language translation with monolingual data by sign back-translation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 20–25, 2021, Nashville, TN, USA, 2021, New York, IEEE: 1316-1325[C]

[6]

ZhangJ, ZhouW, XieC, et al.. Chinese sign language recognition with adaptive HMM. 2016 IEEE International Conference on Multimedia and Expo (ICME), July 11–15, 2016, Seattle, WA, USA, 2016, New York, IEEE: 1-6[C]

[7]

CHEN Y, ZUO R, WEI F, et al. Two-stream network for sign language recognition and translation[EB/OL]. (2022-11-02) [2023-12-12]. https://arxiv.org/abs/2211.01367.

[8]

CamgozN C, HadfieldS, KollerO, et al.. Neural sign language translation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18–23, 2018, Salt Lake City, UT, USA, 2018, New York, IEEE: 7784-7793[C]

[9]

ZhouH, ZhouW, ZhouY, et al.. Spatial-temporal multi-cue network for sign language recognition and translation. IEEE transactions on multimedia, 2021, 24: 768-779 J]

[10]

DengJ, DongW, SocherR, et al.. ImageNET: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, June 20–25, 2009, Miami, FL, USA, 2009, New York, IEEE: 248-255[C]

[11]

GongS, ShiY, JainA. Low quality video face recognition: multi-mode aggregation recurrent network (MARN). 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), October 27–28, 2019, Seoul, Korea (South), 2019, New York, IEEE[C]

[12]

IttnerD J, LewisD D, AhnD D. Text categorization of low quality images. Symposium on Document Analysis and Information Retrieval, 1995, Las Vegas, NV, USA, 1995301-315[C]

[13]

FeichtenhoferC, FanH, MalikJ, et al.. SlowFast networks for video recognition. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), October 27–November 2, 2019, Seoul, Korea (South), 2019, New York, IEEE: 6202-6211[C]

[14]

WangW, ZhengV W, YuH, et al.. A survey of zero-shot learning: settings, methods, and applications. ACM transactions on intelligent systems and technology (TIST), 2019, 10(2): 1-37[J]

[15]

YuJ, GaoH, ChenY, et al.. Adaptive spatiotemporal representation learning for skeleton-based human action recognition. IEEE transactions on cognitive and developmental systems, 2022, 14(4): 1654-1665 J]

[16]

HeX, DengK, WangX, et al.. LightGCN: simplifying and powering graph convolution network for recommendation. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, July 25–30, 2020, Virtual, 2020, New York, ACM: 639-648[C]

[17]

WU F, SOUZA A, ZHANG T, et al. Simplifying graph convolutional networks[EB/OL]. (2019-02-19) [2023-12-12]. https://arxiv.org/abs/1902.07153.

[18]

GengX, LiY, WangL, et al.. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. Proceedings of the AAAI conference on artificial intelligence, 2019, 33(1): 3656-3663 J]

[19]

YanS, XiongY, LinD. Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI conference on artificial intelligence, 2018, 32(1): 6665-7655 J]

[20]

LampertC H, NickischH, HarmelingS. Learning to detect unseen object classes by between-class attribute transfer. 2009 IEEE Conference on Computer Vision and Pattern Recognition, June 20–25, 2009, Miami, FL, USA, 2009, New York, IEEE: 951-958[C]

[21]

CHEN Z, HUANG Y, CHEN J, et al. Duet: cross-modal semantic grounding for contrastive zero-shot learning[EB/OL]. (2022-07-04) [2023-12-12]. https://arxiv.org/abs/2207.01328.

[22]

YANG Z, LI K, GAN H, et al. HD-GCN: a hybrid diffusion graph convolutional network[EB/OL]. (2023-03-31) [2023-12-12]. https://arxiv.org/abs/2303.17966.

[23]

ChenY, ZhangZ, YuanC, et al.. Channel-wise topology refinement graph convolution for skeleton-based action recognition. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), October 10–17, 2021, Montreal, QC, Canada, 2021, New York, IEEE: 13359-13368[C]

[24]

YangC, XuY, ShiJ, et al.. Temporal pyramid network for action recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 14–19, 2020, Seattle, WA, USA, 2020, New York, IEEE: 591-600[C]

[25]

LiJ, HanB, JiangM. Anomaly monitoring and early warning of electric moped charging device with infrared image. Optoelectronics letters, 2025, 21(3): 136-141 J]

RIGHTS & PERMISSIONS

Tianjin University of Technology

AI Summary AI Mindmap
PDF

217

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/