RE2: improving Chinese grammatical error correction via retrieving appropriate examples with explanation
Baoxin WANG , Yumeng LUO , Yixuan WANG , Dayong WU , Wanxiang CHE , Shijin WANG
Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (12) : 1912381
RE2: improving Chinese grammatical error correction via retrieving appropriate examples with explanation
The primary objective of Chinese grammatical error correction (CGEC) is to detect and correct errors in Chinese sentences. Recent research shows that large language models (LLMs) have been applied to CGEC with significant results. For LLMs, selecting appropriate reference examples can help improve their performance. However, existing methods predominantly rely on text similarity for example retrieval, a strategy that frequently mismatches actual error patterns and retrieves lexically similar yet grammatically irrelevant sentences. To address this problem, we propose a method named RE2, which retrieves appropriate examples with explanations of grammatical errors. Instead of using text similarity of the input sentence, we use explanations of grammatical errors to select reference examples, which are used by LLMs to improve the performance of CGEC. We conduct experiments on two CGEC datasets and create a high-quality grammatical error explanation (GEE) dataset, which is not only used in our research but also serves as a valuable resource for future studies in both CGEC and GEE. The experimental results on the two datasets indicate that our proposed method effectively improves the performance of CGEC.
grammatical error correction / large language model / grammatical error explanation
| [1] |
Zhao W, Wang L, Shen K, Jia R, Liu J. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019, 156−165 |
| [2] |
Choe Y J, Ham J, Park K, Yoon Y. A neural grammatical error correction system built on better pre-training and sequential transfer learning. In: Proceedings of the 14th Workshop on Innovative Use of NLP for Building Educational Applications. 2019, 213−227 |
| [3] |
|
| [4] |
Omelianchuk K, Atrasevych V, Chernodub A, Skurzhanskyi O. GECToR – grammatical error correction: tag, not rewrite. In: Proceedings of the 15th Workshop on Innovative Use of NLP for Building Educational Applications. 2020, 163−170 |
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
Nashid N, Sintaha M, Mesbah A. Retrieval-based prompt selection for code-related few-shot learning. In: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). 2023, 2450−2462 |
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
Rubin O, Herzig J, Berant J. Learning to retrieve prompts for in-context learning. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 2655−2671 |
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
Wang Y, Wang B, Liu Y, Wu D, Che W. LM-combiner: A contextual rewriting model for Chinese grammatical error correction. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). 2024, 10675−10685 |
| [22] |
Malmi E, Krause S, Rothe S, Mirylenka D, Severyn A. Encode, tag, realize: high-precision text editing. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 5054−5065 |
| [23] |
Awasthi A, Sarawagi S, Goyal R, Ghosh S, Piratla V. Parallel iterative edit models for local sequence transduction. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019, 4260−4270 |
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
Kaneko M, Okazaki N. Controlled generation with prompt insertion for natural language explanations in grammatical error correction. In: Proceedings of 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. 2024, 3955−3961 |
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
Zhang Y, Li Z, Bao Z, Li J, Zhang B, Li C, Huang F, Zhang M. MuCGEC: a multi-reference multi-source evaluation dataset for Chinese grammatical error correction. In: Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022, 3118−3130 |
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
Tseng Y H, Lee L H, Chang L P, Chen H H. Introduction to SIGHAN 2015 bake-off for Chinese spelling check. In: Proceedings of the 8th SIGHAN Workshop on Chinese Language Processing. 2015, 32−37 |
| [38] |
|
Higher Education Press
/
| 〈 |
|
〉 |