Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning
Yuxin HUANG, Huailing GU, Zhengtao YU, Yumeng GAO, Tong PAN, Jialong XU
Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning
Cross-lingual summarization (CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of low-resource language translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data. We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.
Cross-lingual summarization / Low-resource language / Noisy data / Fine-grained reinforcement learning / Word correlation / Word missing degree
/
〈 | 〉 |