Tibetan Data Augmentation via GAN-Based Handwritten Text Generation

Dorje Tashi , Bingtian Chen , Tianying Sheng , Yongbin Yu , Xiangxiang Wang , Jin Zhang , Lobsang Yeshi , Rinchen Dongrub , Thupten Tsering , Nyima Tashi

CAAI Transactions on Intelligence Technology ›› 2026, Vol. 11 ›› Issue (1) : 55 -65.

PDF (1360KB)
CAAI Transactions on Intelligence Technology ›› 2026, Vol. 11 ›› Issue (1) :55 -65. DOI: 10.1049/cit2.70078
ORIGINAL RESEARCH
research-article
Tibetan Data Augmentation via GAN-Based Handwritten Text Generation
Author information +
History +
PDF (1360KB)

Abstract

Increased awareness of Tibetan cultural preservation, along with technological advancements, has led to significant efforts in academic research on Tibetan. However, the structural complexity of the Tibetan language and limited labeled handwriting data impede advancements in Optical Character Recognition (OCR) and other applications. To address these challenges, this paper proposes an innovative Tibetan data augmentation technique, using Generative Adversarial Networks (GANs) to synthesise arbitrary handwriting images in variable calligraphic styles based on inputs. Moreover, our method leverages a Real-Fake Cross Inputs Strategy during training to enhance generation diversity and improve model generalisability in generating handwritten text beyond the training set and pre-defined corpus. The model was trained on three Tibetan handwriting datasets, including Umê style numerals, Uchen style consonants, and Khyug-yig style words. Experimental results demonstrate that the model successfully generates realistic and recognisable Tibetan numeral and consonant handwriting, achieving Fréchet Inception Distance (FID) scores of 14.45 and 27.63, respectively. The proposed method's effectiveness in augmenting OCR models was validated as evidenced by a reduced OCR Word Error Rate (WER) on the augmented datasets.

Keywords

computer vision / deep learning / handwriting recognition / image representation / OCR

Cite this article

Download citation ▾
Dorje Tashi, Bingtian Chen, Tianying Sheng, Yongbin Yu, Xiangxiang Wang, Jin Zhang, Lobsang Yeshi, Rinchen Dongrub, Thupten Tsering, Nyima Tashi. Tibetan Data Augmentation via GAN-Based Handwritten Text Generation. CAAI Transactions on Intelligence Technology, 2026, 11(1): 55-65 DOI:10.1049/cit2.70078

登录浏览全文

4963

注册一个新账户 忘记密码

Funding

This study was supported in part by the National Science and Tech-nology Major Project under (Grant 2022ZD0116100), in part by the Graduate High-level Talent Training Programme of Tibet University under (Grant 2022-GSP-B023).

Conflicts of Interest

The authors declare no confiicts of interest.

3 Data Availability Statement

The datasets used in the experiments are currently publicly accessible during the study period and can be found in the online repositories.

References

[1]

W. Wang, “Algorithm Study on Feature Extracting of Tibetan Character Recognition,” Journal of Northwest Minorities University (Natural Science) (1999): 3.

[2]

L. L. Ma and J. Wu,“A Tibetan Component Representation Learning Method for Online Handwritten Tibetan Character Recognition,” in 2014 14th International Conference on Frontiers in Handwriting Recog-nition 2014), 317-322.

[3]

R. Dhondub and N. Tashi, “Study on Tibetan Text Recognition in Natural Scenes Based on Deep Learning,” Plateau Science Research 3, no. 4 (2019): 96-103, https://doi.org/10.16249/j.cnki.2096-4617.2019.04.013.

[4]

Z. J. San, Q. Z. M. Gong, R. J. Cai, and M. Z. X. Zhuo, “Multi- Font Tibetan Printed Character Recognition Based on Neural Network,” Computer Simulation 10 (2022), https://kns.cnki.net/kcms2/article/abstract?v=ldCk9GscAdD2Ohlo6I61mYmjP4iyll5UUTeIUXikhM_IVG2Ea4MV9GhF3MUVVnmSAnRtA9tLb5Xp-cri2I54GyJqxHweQoBx9_4frDKKVg15PEjxKK2wDW-S0vEHYOXcLN1v-SMC8mqAb6_Az1-rvoHPMI7onerG7f_WbFk85EQ6sWQ15saqRiOyqNiS3L33j.

[5]

D. Jiang and Y. H. Dong, “Research on Property of Tibetan Char-acters as Information Processing,” Journal of Chinese Information Pro-cessing 9, no. 2 (1995), https://kns.cnki.net/kcms2/article/abstract?v=ldCk9GscAdDDzxJaL2yw9ywPo9eBn9hBiFQoD4cQfsjY0v6WbhgvNddP1uYxOfzO7pKE9TtT083Z1ZHcByuCHLmRSqIOMxodyjCd1kxkGLN_wLj2gHyrx2ZF58qLlSi6THFNzpxF-BQlF3K7J4rQBZG-bl0B9ZIIbhgeiRreY-F5GQcnNEErPQ==.

[6]

L. Wang, S. Bao,Y. He, and H. Yang, “Handwriting Recognition in Tibetan Based on Active Learning Strategy,” in 2022 International Conference on Asian Language Processing (IALP) 2022), 132-136.

[7]

S. Dorjee and D. Luobu, “Overview of the First National Forum on Collation and Research of Tibetan Ancient Texts,” China Tibetology 2 (2018), https://kns.cnki.net/kcms2/article/abstract?v=ldCk9GscAdDfDrY-YIGVqEXXov1LCNrDwoEy8PSKBQ77PB114W52z63_Y-krsVIUNaVwZY_xSVoOGEdr9bnSxy0jrASN6S4F7vzXxR6Av7VrY-ApIZXEcvZDwJTpHkbNWeVcoSTjjeMBCmHKj4ukpQgSHxGB2ULtpAkJXraPhHbUkRqiiZ25m0XvCQ==.

[8]

D. Tashi, T. Sheng, B. Chen, et al., “Construction of a Tibetan Handwriting Khyug-yig Dataset,” Data Intelligence 6, no. 3 (2024): 870-887, https://doi.org/10.3724/2096-7004.di.2024.0048.

[9]

W. L. Wang, X. B. Lu, Z. Q. Cai, et al., “Online Handwritten Sample Generated Based on Component Combination for Tibetan-Sanskrit,” Journal of Chinese Information Processing 31, no. 5 (2017): 64-73, https://doi.org/10.3969/j.issn.1003-0077.2017.05.009.

[10]

K. Guo, D. Gao, and Q. Zhao, “Tibetan Handwritten Character Generation Based on CGAN Model,” in Proceedings of the 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML), (2023), 364-368.

[11]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative Adversarial Nets,” Advances in Neural Information Processing Systems 27 (2014), https://doi.org/10.3156/JSOFT.29.5_177_2.

[12]

A. Hernández-García and P. König, “Further Advantages of Data Augmentation on Convolutional Neural Networks,” in 27th Interna-tional Conference on Artificial Neural Networks (ICANN 2018), (2018), 95-103.

[13]

V. M. Chayal, H. A. Pandya, D. R. Handa, V. Verma, N. M. Chayal, and H. Amin, “An Empirical Investigation of Human Handwritings and Imitational Software Based Digital Machine Writings: A New Category of Forg-ery Christened as ‘Hybrid Forgery’ in Questioned Documents,” Journal of Forensic Research 14, no. 3 (2023), https://doi.org/10.37421/2157-7145.2023.14.553.

[14]

A. Brock, J. Donahue, K. Simonyan, “Large Scale GAN Training for High Fidelity Natural Image Synthesis,” arXiv:1809.11096 (2018).

[15]

J. Ho, A. Jain and P. Abbeel, “Denoising Diffusion Probabilistic Models,” Advances in Neural Information Processing Systems 33 (2020): 6840-6851, https://doi.org/10.48550/arXiv.2006.11239.

[16]

E. Alonso,B. Moysset, and R. Messina, “Adversarial Generation of Handwritten Text Images Conditioned on Sequences,” in 2019 Inter-national Conference on Document Analysis and Recognition (ICDAR) 2019), 481-486.

[17]

S. Fogel, H. Averbuch-Elor, S. Cohen,S. Mazor, and R. Litman, “Scrabblegan: Semi-Supervised Varying Length Handwritten Text Generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), 4324-4333.

[18]

L. Kang, P. Riba, Y. Wang, M. Rusinol,A. Fornés, and M. Villegas, “Ganwriting: Content-Conditioned Generation of Styled Handwritten Word Images,” in Proceedings of the European Conference on Computer Vision, (2020), 273-289.

[19]

A. K. Bhunia, S. Khan, H. Cholakkal, R. M. Anwer,F. S. Khan, and M. Shah, “Handwriting Transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 1086-1094.

[20]

J. Gan and W. Wang, “Higan: Handwriting Imitation Conditioned on Arbitrary-Length Texts and Disentangled Styles,” in Proceedings of the AAAI Conference on Artificial Intelligence, (2021), 7484-7492.

[21]

J. Gan, W. Wang, J. Leng, and X. Gao, “Higan+: Handwriting Imitation GAN With Disentangled Representations,” ACM Transactions on Graphics 42, no. 1 (2022): 1-17, https://doi.org/10.1145/3550070.

[22]

G. Dai, Y. Zhang, Q. Wang, et al., “Disentangling Writer and Character Styles for Handwriting Generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 5977-5986.

[23]

V. Pippi, S. Cascianelli, and R. Cucchiara, “Handwritten Text Generation From Visual Archetypes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 22458-22467.

[24]

G. Dai, Y. Zhang, Q. Ke,Q. Guo, and S. Huang, “One-DM: One-Shot Diffusion Mimicker for Handwritten Text Generation,” in European Conference on Computer Vision, (2024), 410-427.

[25]

M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets,” arXiv:1411.1784 (2014).

[26]

H. De Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, and A. C. Courville, “Modulating Early Visual Processing by Language,” Advances in Neural Information Processing Systems 30 (2017), https://doi.org/10.48550/arXiv.1707.00683.

[27]

A. Graves, S. Fernández,F. Gomez, and J. Schmidhuber, “Con-nectionist Temporal Classification: Labelling Unsegmented Sequence Data With Recurrent Neural Networks,” in Proceedings of the 23rd In-ternational Conference on Machine Learning (ICML), (2006), 369-376.

[28]

J. C. F. Gauss. Tibetanmnist (2018), https://www.heywhale.com/mw/dataset/5bfe734a954d6e0010683839.

[29]

Crxm. Tibetan Handwriting Consonants Dataset (2020), https://www.heywhale.com/mw/dataset/5eb0d52f366f4d002d756691.

[30]

D. P. Kingma, J. Ba, “Adam: A Method for Stochastic Optimiza-tion,” arXiv:1412.6980 (2014).

[31]

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,” Advances in Neural Information Pro-cessing Systems 30 (2017), https://doi.org/10.48550/arXiv.1706.08500.

[32]

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing 13, no. 4 (2004): 600-612, https://doi.org/10.1109/tip.2003.819861.

[33]

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and X. Chen, “Improved Techniques for Training Gans,” Advances in Neural Information Processing Systems 29 (2016), https://doi.org/10.48550/arXiv.1606.03498.

[34]

J. Yang, Z. Liu, S. Xia, et al., “3WC-GBNRS++: A Novel Three-Way Classifier With Granular-Ball Neighborhood Rough Sets Based on Un-certainty,” IEEE Transactions on Fuzzy Systems 32, no. 8 (2024): 4376-4387, https://doi.org/10.1109/tfuzz.2024.3397697.

PDF (1360KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/