Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning

Yuxin HUANG; Huailing GU; Zhengtao YU; Yumeng GAO; Tong PAN; Jialong XU

doi:10.1631/FITEE.2300296

PDF(659 KB)

Front. Inform. Technol. Electron. Eng ›› 2024, Vol. 25 ›› Issue (1) : 121-134. DOI: 10.1631/FITEE.2300296

Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning

Yuxin HUANG¹^,² ,
Huailing GU¹^,² ,
Zhengtao YU¹^,² ,
Yumeng GAO¹^,² ,
Tong PAN¹^,² ,
Jialong XU¹^,²

Author information +

History +

Abstract

Cross-lingual summarization (CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of low-resource language translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data. We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.

Keywords

Cross-lingual summarization / Low-resource language / Noisy data / Fine-grained reinforcement learning / Word correlation / Word missing degree

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Yuxin HUANG, Huailing GU, Zhengtao YU, Yumeng GAO, Tong PAN, Jialong XU. Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning. Front. Inform. Technol. Electron. Eng, 2024, 25(1): 121‒134 https://doi.org/10.1631/FITEE.2300296