PRAE: progressive retrieval-augmented dynamic knowledge editing for large language models

Hao LI; Zheng CHU; Jiafeng LIANG; Yuxin WANG; Wei TANG; Xun MAO; Kai LV; Lei CHEN; Ming LIU; Bing QIN

doi:10.1007/s11704-025-50492-z

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (1) :2101310 DOI: 10.1007/s11704-025-50492-z

Artificial Intelligence

RESEARCH ARTICLE

PRAE: progressive retrieval-augmented dynamic knowledge editing for large language models

Author information +

History +

PDF (5035KB)

Abstract

The knowledge stored within large language models (LLMs) tends to become outdated as the real world rapidly evolves. Therefore, efficient knowledge editing methods have gradually been widely studied. Previous methods primarily focus on parametric knowledge injection, which is struggling to extend to large-scale editing and is time-consuming for each edit. An alternative approach is Retrieval-Augmented Generation (RAG), which enables efficient knowledge injection. However, it faces issues with conflicts between internal and external knowledge, as well as fine-grained retrieval challenges. To address this, we propose Progressive Retrieval-Augmented Dynamic Knowledge Editing (PRAE), a knowledge editing framework based on contextual knowledge injection, which fine-tunes LLMs on a carefully designed dataset to equip them with two core capabilities: progressive retrieval, enabling the step-by-step incorporation of editing knowledge to tackle multi-hop problems, and dynamic knowledge utilization, allowing the flexible and effective use of retrieved knowledge. Experimental results on seven knowledge editing datasets demonstrate that our method outperforms state-of-the-art methods by 7.1% and 25.3% on single-hop and multi-hop tasks, respectively. Our further analysis reveals that PRAE exhibits superior generalization capability and computational efficiency.

Graphical abstract

Keywords

knowledge editing / large language models / retrieval-augmented generation

Cite this article

Download citation ▾

Hao LI, Zheng CHU, Jiafeng LIANG, Yuxin WANG, Wei TANG, Xun MAO, Kai LV, Lei CHEN, Ming LIU, Bing QIN. PRAE: progressive retrieval-augmented dynamic knowledge editing for large language models. Front. Comput. Sci., 2027, 21(1): 2101310 DOI:10.1007/s11704-025-50492-z

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]

Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020. 159

[2]

Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S, Bikel D, Blecher L, Ferrer C C, Chen M, Cucurull G, Esiobu D, Fernandes J, Fu J, Fu W, Fuller B, Gao C, Goswami V, Goyal N, Hartshorn A S, Hosseini S, Hou R, Inan H, Kardas M, Kerkez V, Khabsa M, Kloumann I M, Korenev A, Koura P S, Lachaux M A, Lavril T, Lee J, Liskovich D, Lu Y, Mao Y, Martinet X, Mihaylov T, Mishra P, Molybog I, Nie Y, Poulton A, Reizenstein J, Rungta R, Saladi K, Schelten A, Silva R, Smith E M, Subramanian R, Tan X E, Tang B, Taylor R, Williams A, Kuan J X, Xu P, Yan Z, Zarov I, Zhang Y, Fan A, Kambadur M, Narang S, Rodriguez A, Stojnic R, Edunov S, Scialom T. Llama 2: open foundation and fine-tuned chat models. 2023, arXiv preprint arXiv: 2307.09288

[3]	OpenAI . GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774

[4]	Meta . Introducing meta Llama 3: the most capable openly available LLM to date. See aili.app/share/7TmuNv3jZlt8Z2LGxHptXt website, 2024

[5]

DeepSeek-AI , Liu A, Feng B, Wang B, Wang B, Liu B, Zhao C, Dengr C, Ruan C, Dai D, Guo D, Yang D, Chen D, Ji D, Li E, Lin F, Luo F, Hao G, Chen G, Li G, Zhang H, Xu H, Yang H, Zhang H, Ding H, Xin H, Gao H, Li H, Qu H, Cai J L, Liang J, Guo J, Ni J, Li J, Chen J, Yuan J, Qiu J, Song J, Dong K, Gao K, Guan K, Wang L, Zhang L, Xu L, Xia L, Zhao L, Zhang L, Li M, Wang M, Zhang M, Zhang M, Tang M, Li M, Tian N, Huang P, Wang P, Zhang P, Zhu Q, Chen Q, Du Q, Chen R J, Jin R L, Ge R, Pan R, Xu R, Chen R, Li S S, Lu S, Zhou S, Chen S, Wu S, Ye S, Ma S, Wang S, Zhou S, Yu S, Zhou S, Zheng S, Wang T, Pei T, Yuan T, Sun T, Xiao W L, Zeng W, An W, Liu W, Liang W, Gao W, Zhang W, Li X Q, Jin X, Wang X, Bi X, Liu X, Wang X, Shen X, Chen X, Chen X, Nie X, Sun X, Wang X, Liu X, Xie X, Yu X, Song X, Zhou X, Yang X, Lu X, Su X, Wu Y, Li Y K, Wei Y X, Zhu Y X, Xu Y, Huang Y, Li Y, Zhao Y, Sun Y, Li Y, Wang Y, Zheng Y, Zhang Y, Xiong Y, Zhao Y, He Y, Tang Y, Piao Y, Dong Y, Tan Y, Liu Y, Wang Y, Guo Y, Zhu Y, Wang Y, Zou Y, Zha Y, Ma Y, Yan Y, You Y, Liu Y, Ren Z Z, Ren Z, Sha Z, Fu Z, Huang Z, Zhang Z, Xie Z, Hao Z, Shao Z, Wen Z, Xu Z, Zhang Z, Li Z, Wang Z, Gu Z, Li Z, Xie Z. DeepSeek-V2: a strong, economical, and efficient mixture-of-experts language model. 2024, arXiv preprint arXiv: 2405.04434

[6]	Sheng E, Chang K W, Natarajan P, Peng N. The woman worked as a babysitter: on biases in language generation. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 3407−3412

[7]	Elazar Y, Kassner N, Ravfogel S, Ravichander A, Hovy E, Schütze H, Goldberg Y . Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, 2021, 9: 1012–1031

[8]	Zhao Z, Wallace E, Feng S, Klein D, Singh S. Calibrate before use: improving few-shot performance of language models. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 12697−12706

[9]	Yao Y, Wang P, Tian B, Cheng S, Li Z, Deng S, Chen H, Zhang N. Editing large language models: problems, methods, and opportunities. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 10222−10240

[10]	Dhingra B, Cole J R, Eisenschlos J M, Gillick D, Eisenstein J, Cohen W W . Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics, 2022, 10: 257–273

[11]	De Cao N, Aziz W, Titov I. Editing factual knowledge in language models. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 6491−6506

[12]	Mitchell E, Lin C, Bosselut A, Finn C, Manning C D. Fast model editing at scale. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[13]	Meng K, Bau D, Andonian A, Belinkov Y. Locating and editing factual associations in GPT. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022. 1262

[14]	Meng K, Sharma A S, Andonian A J, Belinkov Y, Bau D. Mass-editing memory in a transformer. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[15]	Dai D, Dong L, Hao Y, Sui Z, Chang B, Wei F. Knowledge neurons in pretrained transformers. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 8493−8502

[16]	Zhou W, Zhang S, Poon H, Chen M. Context-faithful prompting for large language models. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 14544−14556

[17]	Wang P, Li Z, Zhang N, Xu Z, Yao Y, Jiang Y, Xie P, Huang F, Chen H. WISE: rethinking the knowledge memory for lifelong model editing of large language models. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024. 1703

[18]	Mitchell E, Lin C, Bosselut A, Manning C D, Finn C. Memory-based model editing at scale. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 15817−15831

[19]	Zhong Z, Wu Z, Manning C D, Potts C, Chen D. MQuAKE: assessing knowledge editing in language models via multi-hop questions. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023. 15686−15702

[20]	Longpre S, Perisetla K, Chen A, Ramesh N, DuBois C, Singh S. Entity-based knowledge conflicts in question answering. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 7052−7063

[21]	Neeman E, Aharoni R, Honovich O, Choshen L, Szpektor I, Abend O. DisentQA: disentangling parametric and contextual knowledge with counterfactual question answering. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 10056−10070

[22]	Xie J, Zhang K, Chen J, Lou R, Su Y. Adaptive chameleon or stubborn sloth: revealing the behavior of large language models in knowledge conflicts. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[23]	Chen H T, Zhang M J Q, Choi E. Rich knowledge sources bring complex knowledge conflicts: recalibrating models to reflect conflicting evidence. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 2292−2307

[24]	Jiang Z, Xu F F, Gao L, Sun Z, Liu Q, Dwivedi-Yu J, Yang Y, Callan J, Neubig G. Active retrieval augmented generation. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 7969−7992

[25]	Zhu C, Rawat A S, Zaheer M, Bhojanapalli S, Li D, Yu F, Kumar S. Modifying memories in transformer models. 2020, arXiv preprint arXiv: 2012.00363

[26]	Dong Q, Dai D, Song Y, Xu J, Sui Z, Li L. Calibrating factual knowledge in pretrained language models. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2022. 2022, 5937−5947

[27]	Huang Z, Shen Y, Zhang X, Zhou J, Rong W, Xiong Z. Transformer-patcher: one mistake worth one neuron. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[28]	Zheng C, Li L, Dong Q, Fan Y, Wu Z, Xu J, Chang B. Can we edit factual knowledge by in-context learning? In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 4862−4876

[29]	Madaan A, Tandon N, Clark P, Yang Y. Memory-assisted prompt editing to improve GPT-3 after deployment. In: Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing. 2022, 2833−2861

[30]	Jiang Y, Wang Y, Wu C, Zhong W, Zeng X, Gao J, Li L, Jiang X, Shang L, Tang R, Liu Q, Wang W. Learning to edit: aligning LLMs with knowledge editing. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 4689−4705

[31]	Izacard G, Lewis P, Lomeli M, Hosseini L, Petroni F, Schick T, Dwivedi-Yu J, Joulin A, Riedel S, Grave E . Atlas: few-shot learning with retrieval augmented language models. The Journal of Machine Learning Research, 2023, 24( 1): 251

[32]	Chu Z, Chen J, Chen Q, Wang H, Zhu K, Du X, Yu W, Liu M, Qin B. BeamAggR: beam aggregation reasoning over multi-source knowledge for multi-hop question answering. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 1229−1248

[33]	Gao Y, Xiong Y, Gao X, Jia K, Pan J, Bi Y, Dai Y, Sun J, Guo Q, Wang M, Wang H. Retrieval-augmented generation for large language models: a survey. 2023, arXiv preprint arXiv: 2312.10997

[34]	Press O, Zhang M, Min S, Schmidt L, Smith N, Lewis M. Measuring and narrowing the compositionality gap in language models. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2023. 2023, 5687−5711

[35]	Zhou D, Schärli N, Hou L, Wei J, Scales N, Wang X, Schuurmans D, Cui C, Bousquet O, Le Q V, Chi E H. Least-to-most prompting enables complex reasoning in large language models. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[36]	Su W, Tang Y, Ai Q, Wu Z, Liu Y. DRAGIN: dynamic retrieval augmented generation based on the real-time information needs of large language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 12991−13013

[37]	Cohen R, Biran E, Yoran O, Globerson A, Geva M . Evaluating the ripple effects of knowledge editing in language models. Transactions of the Association for Computational Linguistics, 2024, 12: 283–298

[38]	Hartvigsen T, Sankaranarayanan S, Palangi H, Kim Y, Ghassemi M. Aging with GRACE: lifelong model editing with discrete key-value adaptors. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023. 2079

[39]	Levy O, Seo M, Choi E, Zettlemoyer L. Zero-shot relation extraction via reading comprehension. In: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). 2017, 333−342

[40]	Izacard G, Caron M, Hosseini L, Riedel S, Bojanowski P, Joulin A, Grave E. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research, 2022, 2022

[41]	Gu H, Zhou K, Han X, Liu N, Wang R, Wang X. PokeMQA: programmable knowledge editing for multi-hop question answering. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 8069−8083