Paradox of poetic intent in back-translation: evaluating the quality of large language models in Chinese translation

Li WEIGANG; Pedro Carvalho BROM

doi:10.1631/FITEE.2500298

Front. Inform. Technol. Electron. Eng ›› 2025, Vol. 26 ›› Issue (11) :2176 -2203. DOI: 10.1631/FITEE.2500298

Research Article

Paradox of poetic intent in back-translation: evaluating the quality of large language models in Chinese translation

Li WEIGANG ¹
, Pedro Carvalho BROM ²

Author information +

History +

PDF (729KB)

Abstract

Large language models (LLMs) excel in multilingual translation tasks, yet often struggle with culturally and semantically rich Chinese texts. This study introduces the framework of back-translation (BT) powered by LLMs, or LLM-BT, to evaluate Chinese → intermediate language → Chinese translation quality across five LLMs and three traditional systems. We construct a diverse corpus containing scientific abstracts, historical paradoxes, and literary metaphors, reflecting the complexity of Chinese at the lexical and semantic levels. Using our modular NLPMetrics system, including bilingual evaluation understudy (BLEU), character F-score (CHRF), translation edit rate (TER), and semantic similarity (SS), we find that LLMs outperform traditional tools in cultural and literary tasks. However, the results of this study uncover a high-dimensional behavioral phenomenon, the paradox of poetic intent, where surface fluency is preserved, but metaphorical or emotional depth is lost. Additionally, some models exhibit verbatim BT, suggesting a form of data-driven quasi-self-awareness, particularly under repeated or cross-model evaluation. To address BLEU's limitations for Chinese, we propose a Jieba-segmentation BLEU variant that incorporates word-frequency and n-gram weighting, improving sensitivity to lexical segmentation and term consistency. Supplementary tests show that in certain semantic dimensions, LLM outputs approach the fidelity of human poetic translations, despite lacking a deeper metaphorical intent. Overall, this study reframes traditional fidelity vs. fluency evaluation into a richer, multi-layered analysis of LLM behavior, offering a transparent framework that contributes to explainable artificial intelligence and identifies new research pathways in cultural natural language processing and multilingual LLM alignment.

Keywords

Back-translation / Chinese natural language processing / Large language model-based back-translation (LLM-BT) / Paradox of poetic intent / Quasi-self-awareness / Verbatim back-translation

Cite this article

Download citation ▾

Li WEIGANG, Pedro Carvalho BROM. Paradox of poetic intent in back-translation: evaluating the quality of large language models in Chinese translation. Front. Inform. Technol. Electron. Eng, 2025, 26(11): 2176-2203 DOI:10.1631/FITEE.2500298