Controllable data synthesis method for grammatical error correction

Liner YANG; Chengcheng WANG; Yun CHEN; Yongping DU; Erhong YANG

doi:10.1007/s11704-020-0286-4

PDF(17449 KB)

Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (4) : 164318. DOI: 10.1007/s11704-020-0286-4

Artificial Intelligence

RESEARCH ARTICLE

Controllable data synthesis method for grammatical error correction

Author information +

History +

Abstract

Due to the lack of parallel data in current grammatical error correction (GEC) task, models based on sequence to sequence framework cannot be adequately trained to obtain higher performance. We propose two data synthesis methods which can control the error rate and the ratio of error types on synthetic data. The first approach is to corrupt each word in the monolingual corpus with a fixed probability, including replacement, insertion and deletion. Another approach is to train error generation models and further filtering the decoding results of the models. The experiments on different synthetic data show that the error rate is 40% and that the ratio of error types is the same can improve the model performance better. Finally, we synthesize about 100 million data and achieve comparable performance as the state of the art, which uses twice as much data as we use.

Graphical abstract

Keywords

grammatical error correction / sequence to sequence / data synthesis

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Liner YANG, Chengcheng WANG, Yun CHEN, Yongping DU, Erhong YANG. Controllable data synthesis method for grammatical error correction. Front. Comput. Sci., 2022, 16(4): 164318 https://doi.org/10.1007/s11704-020-0286-4

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Dale R, Kilgarriff A. Helping our own: The hoo 2011 pilot shared task. In: Proceedings of the 13th European Workshop on Natural Language Generation, Association for Computational Linguistics. 2011, 242−249

[2]	Dale R, Anisimoff I, Narroway G. Hoo 2012: a report on the preposition and determiner error correction shared task. In: Proceedings of the 7th Workshop on Building Educational Applications Using NLP, Association for Computational Linguistics. 2012, 54−62

[3]	Ng H T, Wu S M, Wu Y, Hadiwinoto C, Tetreault J. The conll-2013 shared task on grammatical error correction. In: Proceedings of the 17th Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics. 2013, 1−12

[4]	Ng H T, Wu S M, Briscoe T, Hadiwinoto C, Susanto R H, Bryant C. The conll-2014 shared task on grammatical error correction. In: Proceedings of the 18th Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics. 2014, 1−14

[5]	Brockett C, Dolan W B, Gamon M. Correcting esl errors using phrasal smt techniques. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. 2006, 249−256

[6]	Chollampatt S, Ng H T. A multilayer convolutional encoder-decoder neural network for grammatical error correction. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018

[7]	Chollampatt S, Ng H T. Neural quality estimation of grammatical error correction. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 2528−2539

[8]	Grundkiewicz R, Junczys-Dowmunt M. Near human-level performance in grammatical error correction with hybrid machine translation. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, 284−290

[9]	Ge T, Wei F, Zhou M. Fluency boost learning and inference for neural grammatical error correction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 1055−1065

[10]	Mizumoto T, Komachi M, Nagata M, Matsumoto Y. Mining revision log of language learning sns for automated japanese error correction of second language learners. In: Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011, 147−155

[11]	Dahlmeier D, Ng H T, Wu S M. Building a large annotated corpus of learner english: The nus corpus of learner english. In: Proceedings of the 8th workshop on innovative use of NLP for building educational applications. 2013, 22−31

[12]

Junczys-Dowmunt M, Grundkiewicz R, Guha S, Heafield K. Approaching neural grammatical error correction as a low-resource machine translation task. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, 595−606

[13]	Zhao W, Wang L, Shen K, Jia R, Liu J. Improving grammatical error correction via pre-training a copy-augmente d architecture with unlabeled data. arXiv, 1903.00138

[14]	Lichtarge J, Alberti C, Kumar S, Shazeer N, Parmar N, Tong S. Corpora generation for grammatical error correction. arXiv, 1904.05780

[15]	Xie Z, Avati A, Arivazhagan N, Jurafsky D, Ng A Y. Neural language correction with character-based attention. arXiv, 1603.09727

[16]	Xie Z, Genthial G, Xie S, Ng A, Jurafsky D. Noising and denoising natural language: Diverse backtranslation for grammar correction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, 619−628

[17]	Felice M, Yuan Z, Andersen E, Yannakoudakis H, Kochmar E. Grammatical error correction using hybrid systems and type filtering. In: Proceedings of the 18th Conference on Computational Natural Language Learning: Shared Task, Association for Computational Linguistics. 2014, 15−24

[18]	Junczys-Dowmunt M, Grundkiewicz R. The amu system in the conll-2014 shared task: Grammatical error correction by data-intensive and featurerich statistical machine translation. In: Proceedings of the 18th Conference on Computational Natural Language Learning: Shared Task. 2014, 25−33

[19]

Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, et al. Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. 2007, 177−180

[20]	Chollampatt S, Ng H T. Connecting the dots: towards human-level grammatical error correction. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 2017, 327−333

[21]	Yuan Z, Briscoe T. Grammatical error correction using neural machine translation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016, 380−386

[22]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Advances in Neural Information Processing Systems. 2017, 5998−6008

[23]	Yuan Z, Felice M. Constrained grammatical error correction using statistical machine translation. In: Proceedings of the 17th Conference on Computational Natural Language Learning: Shared Task. 2013, 52−61

[24]	Rei M, Felice M, Yuan Z, Briscoe T. Artificial error generation with machine translation and syntactic patterns. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 2017, 287−292

[25]	Rozovskaya A, Roth D. Generating confusion sets for context-sensitive error correction. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. 2010, 961−970

[26]	Felice M, Yuan Z. Generating artificial errors for grammatical error correction. In: Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics. 2014, 116−126

[27]	Sennrich R, Haddow B, Birch A. Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 86−96

[28]	Bryant C , Felice M , Briscoe E J . Automatic annotation and evaluation of error types for grammatical error correction. Association for Computational Linguistics, 2017,

[29]	Chelba C, Mikolov T, Schuster M, Ge Q, Brants T, Koehn P, Robinson T. One billion word benchmark for measuring progress in statistical language modeling. arXiv, 1312.3005

[30]	Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. In: Proceedings of the Association for Computational Linguistics. 2016, 1715−1725

[31]	Edunov S, Ott M, Auli M, Grangier D. Understanding back-translation at scale. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 489−500

[32]	Dahlmeier D, Ng H T. Better evaluation for grammatical error correction. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics. 2012, 68−572

[33]	Fadaee M, Monz C. Back-translation sampling by targeting difficult words in neural machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 436−446

[34]	Junczys-Dowmunt M, Grundkiewicz R. Phrase-based machine translation is state-of-the-art for automatic grammatical error correction. arXiv, 1605.06353

Acknowledgements

This work was supported by the funds of Beijing Advanced Innovation Center for Language Resources (TYZ19005) and Research Program of State Language Commission (ZDI135-105, YB135-89).

RIGHTS & PERMISSIONS

2022 Higher Education Press

AI Summary AI Mindmap

PDF(17449 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Accepted	Published
19 Jun 2020	11 Dec 2020	15 Aug 2022
Just Accepted Date	Issue Date
16 Mar 2021	01 Dec 2021

About the journal

Aims & scope

Description

Editorial board

Abstracting / Indexing

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers