RE2: improving Chinese grammatical error correction via retrieving appropriate examples with explanation

Baoxin WANG , Yumeng LUO , Yixuan WANG , Dayong WU , Wanxiang CHE , Shijin WANG

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912381

PDF (2688KB)
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912381 DOI: 10.1007/s11704-025-41399-w
Artificial Intelligence
RESEARCH ARTICLE

RE2: improving Chinese grammatical error correction via retrieving appropriate examples with explanation

Author information +
History +
PDF (2688KB)

Abstract

The primary objective of Chinese grammatical error correction (CGEC) is to detect and correct errors in Chinese sentences. Recent research shows that large language models (LLMs) have been applied to CGEC with significant results. For LLMs, selecting appropriate reference examples can help improve their performance. However, existing methods predominantly rely on text similarity for example retrieval, a strategy that frequently mismatches actual error patterns and retrieves lexically similar yet grammatically irrelevant sentences. To address this problem, we propose a method named RE2, which retrieves appropriate examples with explanations of grammatical errors. Instead of using text similarity of the input sentence, we use explanations of grammatical errors to select reference examples, which are used by LLMs to improve the performance of CGEC. We conduct experiments on two CGEC datasets and create a high-quality grammatical error explanation (GEE) dataset, which is not only used in our research but also serves as a valuable resource for future studies in both CGEC and GEE. The experimental results on the two datasets indicate that our proposed method effectively improves the performance of CGEC.

Graphical abstract

Keywords

grammatical error correction / large language model / grammatical error explanation

Cite this article

Download citation ▾
Baoxin WANG, Yumeng LUO, Yixuan WANG, Dayong WU, Wanxiang CHE, Shijin WANG. RE2: improving Chinese grammatical error correction via retrieving appropriate examples with explanation. Front. Comput. Sci., 2025, 19(12): 1912381 DOI:10.1007/s11704-025-41399-w

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Zhao W, Wang L, Shen K, Jia R, Liu J. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 156−165

[2]

Choe Y J, Ham J, Park K, Yoon Y. A neural grammatical error correction system built on better pre-training and sequential transfer learning. In: Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications. 2019, 213−227

[3]

Fang T, Hu J, Wong D F, Wan X, Chao L S, Chang T H. Improving grammatical error correction with multimodal feature integration. In: Proceedings of Findings of the Association for Computational Linguistics: ACL 2023. 2023, 9328−9344

[4]

Omelianchuk K, Atrasevych V, Chernodub A, Skurzhanskyi O. GECToR – grammatical error correction: tag, not rewrite. In: Proceedings of the 15th Workshop on Innovative Use of NLP for Building Educational Applications. 2020, 163−170

[5]

Lai S, Zhou Q, Zeng J, Li Z, Li C, Cao Y, Su J. Type-driven multi-turn corrections for grammatical error correction. In: Proceedings of Findings of the Association for Computational Linguistics: ACL 2022. 2022, 3225−3236

[6]

Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 159

[7]

Zhao W X, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z, Du Y, Yang C, Chen Y, Chen Z, Jiang J, Ren R, Li Y, Tang X, Liu Z, Liu P, Nie J Y, Wen J R. A survey of large language models. 2023, arXiv preprint arXiv: 2303.18223

[8]

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C L, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2011

[9]

Yang H, Quan X. Alirector: Alignment-enhanced Chinese grammatical error corrector. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024. 2024, 2531−2546

[10]

Wang S, Xu Y, Fang Y, Liu Y, Sun S, Xu R, Zhu C, Zeng M. Training data is more valuable than you think: A simple and effective method by retrieving from training data. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022, 3170−3179

[11]

Nashid N, Sintaha M, Mesbah A. Retrieval-based prompt selection for code-related few-shot learning. In: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). 2023, 2450−2462

[12]

Huang J, Ping W, Xu P, Shoeybi M, Chang K C C, Catanzaro B. RAVEN: In-context learning with retrieval-augmented encoder-decoder language models. 2024, arXiv preprint arXiv: 2308.07922

[13]

Fei Y, Cui L, Yang S, Lam W, Lan Z, Shi S. Enhancing grammatical error correction systems with explanations. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023, 7489−7501

[14]

Song Y, Krishna K, Bhatt R, Gimpel K, Iyyer M. GEE! Grammar error explanation with large language models. In: Proceedings of Findings of the Association for Computational Linguistics: NAACL 2024. 2024, 754−781

[15]

Xu L, Wu J, Peng J, Fu J, Cai M. FCGEC: Fine-grained corpus for Chinese grammatical error correction. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 1900−1918

[16]

Rubin O, Herzig J, Berant J. Learning to retrieve prompts for in-context learning. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 2655−2671

[17]

Ma S, Li Y, Sun R, Zhou Q, Huang S, Zhang D, Li Y, Liu R, Li Z, Cao Y, Zheng H, Shen Y. Linguistic rules-based corpus generation for native Chinese grammatical error correction. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 576−589

[18]

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[19]

Fu K, Huang J, Duan Y. Youdao’s winning solution to the nlpcc-2018 task 2 challenge: a neural machine translation approach to Chinese grammatical error correction. In: Proceedings of the 7th CCF International Conference on Natural Language Processing and Chinese Computing. 2018, 341−350

[20]

Zhu J, Xia Y, Wu L, He D, Qin T, Zhou W, Li H, Liu T Y. Incorporating BERT into neural machine translation. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[21]

Wang Y, Wang B, Liu Y, Wu D, Che W. LM-combiner: A contextual rewriting model for Chinese grammatical error correction. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024, 10675−10685

[22]

Malmi E, Krause S, Rothe S, Mirylenka D, Severyn A. Encode, tag, realize: high-precision text editing. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 5054−5065

[23]

Awasthi A, Sarawagi S, Goyal R, Ghosh S, Piratla V. Parallel iterative edit models for local sequence transduction. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 4260−4270

[24]

Fang T, Yang S, Lan K, Wong D F, Hu J, Chao L S, Zhang Y. Is ChatGPT a highly fluent grammatical error correction system? A comprehensive evaluation. 2023, arXiv preprint arXiv: 2304.01746

[25]

Li Y, Huang H, Ma S, Jiang Y, Li Y, Zhou F, Zheng H T, Zhou Q. On the (in)effectiveness of large language models for Chinese text correction. 2023, arXiv preprint arXiv: 2307.09007

[26]

Fan Y, Jiang F, Li P, Li H. GrammarGPT: Exploring open-source LLMs for native Chinese grammatical error correction with supervised fine-tuning. In: Proceedings of the 12th CCF International Conference on Natural Language Processing and Chinese Computing. 2023, 69−80

[27]

Kaneko M, Okazaki N. Controlled generation with prompt insertion for natural language explanations in grammatical error correction. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 3955−3961

[28]

Li X L, Liang P. Prefix-tuning: Optimizing continuous prompts for generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021, 4582−4597

[29]

Che W, Li Z, Liu T. LTP: A Chinese language technology platform. In: Proceedings of Coling 2010: Demonstrations. 2010, 13−16

[30]

Yang A, Yang B, Hui B, Zheng B, Yu B, Zhou C, Li C, Li C, Liu D, Huang F, Dong G, Wei H, Lin H, Tang J, Wang J, Yang J, Tu J, Zhang J, Ma J, Yang J, Xu J, Zhou J, Bai J, He J, Lin J, Dang K, Lu K, Chen K, Yang K, Li M, Xue M, Ni N, Zhang P, Wang P, Peng R, Men R, Gao R, Lin R, Wang S, Bai S, Tan S, Zhu T, Li T, Liu T, Ge W, Deng X, Zhou X, Ren X, Zhang X, Wei X, Ren X, Fan Y, Yao Y, Zhang Y, Wan Y, Chu Y, Liu Y, Cui Z, Zhang Z, Guo Z, Fan Z. Qwen2 technical report. 2024, arXiv preprint arXiv: 2407.10671

[31]

Tang Y, Hasan R, Runkler T A. FsPONER: few-shot prompt optimization for named entity recognition in domain-specific scenarios. In: Proceedings of the 27th European Conference on Artificial Intelligence. 2024, 3757−3764

[32]

Zheng Y, Zhang R, Zhang J, Ye Y, Luo Z. LlamaFactory: unified efficient fine-tuning of 100+ language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 2024, 400−410

[33]

Zhang Y, Li Z, Bao Z, Li J, Zhang B, Li C, Huang F, Zhang M. MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 3118−3130

[34]

Lin C Y. ROUGE: A package for automatic evaluation of summaries. In: Proceedings of the Text Summarization Branches Out. 2004, 74−81

[35]

Robertson S, Zaragoza H . The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval, 2009, 3( 4): 333–389

[36]

Xiao S, Liu Z, Zhang P, Muennighoff N, Lian D, Nie J Y. C-pack: packed resources for general Chinese embeddings. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024, 641−649

[37]

Tseng Y H, Lee L H, Chang L P, Chen H H. Introduction to SIGHAN 2015 bake-off for Chinese spelling check. In: Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing. 2015, 32−37

[38]

Rao G, Gong Q, Zhang B, Xun E. Overview of nlptea-2018 share task Chinese grammatical error diagnosis. In: Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications. 2018, 42−51

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (2688KB)

Supplementary files

Highlights

277

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/