On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review

Biwei Yan , Kun Li , Minghui Xu , Yueyan Dong , Yue Zhang , Zhaochun Ren , Xiuzhen Cheng

High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (2) : 100300

PDF (1483KB)
High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (2) : 100300 DOI: 10.1016/j.hcc.2025.100300
Review article

On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review

Author information +
History +
PDF (1483KB)

Abstract

Large Language Models (LLMs) are complex artificial intelligence systems, which can understand, generate, and translate human languages. By analyzing large amounts of textual data, these models learn language patterns to perform tasks such as writing, conversation, and summarization. Agents built on LLMs (LLM agents) further extend these capabilities, allowing them to process user interactions and perform complex operations in diverse task environments. However, during the processing and generation of massive data, LLMs and LLM agents pose a risk of sensitive information leakage, potentially threatening data privacy. This paper aims to demonstrate data privacy issues associated with LLMs and LLM agents to facilitate a comprehensive understanding. Specifically, we conduct an in-depth survey about privacy threats, encompassing passive privacy leakage and active privacy attacks. Subsequently, we introduce the privacy protection mechanisms employed by LLMs and LLM agents and provide a detailed analysis of their effectiveness. Finally, we explore the privacy protection challenges for LLMs and LLM agents as well as outline potential directions for future developments in this domain.

Keywords

Large Language Models (LLMs) / Security / Data privacy / Privacy protection / LLM agents / Survey

Cite this article

Download citation ▾
Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, Xiuzhen Cheng. On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review. High-Confidence Computing, 2025, 5(2): 100300 DOI:10.1016/j.hcc.2025.100300

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Biwei Yan: Writing - review & editing, Writing - original draft, Visualization, Methodology, Formal analysis, Conceptualization. Kun Li: Methodology. Minghui Xu: Methodology. Yueyan Dong: Writing - original draft. Yue Zhang: Writing - review & editing, Methodology, Conceptualization. Zhaochun Ren: Data curation, Conceptualization. Xiuzhen Cheng: Writing - review & editing, Methodology, Formal analysis.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (62402288 and 62302063) and the China Postdoctoral Science Foundation, China (2024M751811).

References

[1]

M. Gao, X. Hu, J. Ruan, X. Pu, X. Wan,LLM-based NLG evaluation: Current status and challenges, 2024, arXiv preprint arXiv:2402.01383.

[2]

Y. Xie, C. Yu, T. Zhu, J. Bai, Z. Gong, H. Soh, Translating natural language to planning goals with large-language models, 2023, arXiv preprint arXiv: 2302.05128.

[3]

S. Minaee, T. Mikolov, N. Nikzad, M. Chenaghlu, R. Socher, X. Amatriain, J. Gao, Large language models: A survey, 2024, arXiv preprint arXiv: 2402.06196.

[4]

C.H. Song, J. Wu, C. Washington, B.M. Sadler, W.-L. Chao, Y. Su, Llm-planner: Few-shot grounded planning for embodied agents with large language models,in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 2998-3009.

[5]

Z. Xu, K. Wu, J. Wen, J. Li, N. Liu, Z. Che, J. Tang, A survey on robotics with foundation models: toward embodied AI, 2024, arXiv preprint arXiv: 2402.02385.

[6]

J. Duan, S. Yu, H.L. Tan, H. Zhu, C. Tan, A survey of embodied ai: From simulators to research tasks, IEEE Trans. Emerg. Top. Comput. Intell. 6 (2)(2022) 230-244.

[7]

Y. Cao, S. Li, Y. Liu, Z. Yan, Y. Dai, P.S. Yu, L. Sun, A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to ChatGPT, 2023, arXiv preprint arXiv:2303.04226.

[8]

J. Wu, W. Gan, Z. Chen, S. Wan, H. Lin, Ai-generated content (aigc): A survey, 2023, arXiv preprint arXiv:2304.06632.

[9]

Y. Cheng, M. Xu, Y. Zhang, K. Li, R. Wang, L. Yang, AutoIoT: Automated IoT platform using large language models, 2024, arXiv preprint arXiv: 2411.10665.

[10]

Y. Yao, J. Duan, K. Xu, Y. Cai, Z. Sun, Y. Zhang, A survey on large language model (llm) security and privacy: The good, the bad, and the ugly, High-Confid. Comput. (2024) 100211.

[11]

N. Subramani, S. Luccioni, J. Dodge, M. Mitchell, Detecting personal information in training corpora: an analysis,in: Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing, TrustNLP 2023, 2023, pp. 208-220.

[12]

R. Staab, M. Vero, M. Balunović, M. Vechev, Beyond memorization: Violating privacy via inference with large language models, 2023, arXiv preprint arXiv:2310.07298.

[13]

Z. Hu, P. Yang, F. Liu, Y. Meng, X. Liu, Prompting large language models with knowledge-injection for knowledge-based visual question answering, Big Data Min. Anal. 7 (3) (2024) 843-857.

[14]

J. Zhao, Privacy-preserving fine-tuning of artificial intelligence (AI) foun-dation models with federated learning, differential privacy, offsite tuning, and parameter-efficient fine-tuning (PEFT), 2023, Authorea Preprint.

[15]

L. Luo, J. Ning, Y. Zhao, Z. Wang, Z. Ding, P. Chen, W. Fu, Q. Han, G. Xu, Y. Qiu, et al., Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks, 2023, arXiv preprint arXiv:2311.11608.

[16]

M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A.F. Cooper, D. Ippolito, C.A. Choquette-Choo, E. Wallace, F. Tramèr, K. Lee, Scalable extraction of training data from (production) language models, 2023, arXiv preprint arXiv:2311.17035.

[17]

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., Language models are unsupervised multitask learners, OpenAI Blog 1 (8)(2019) 9.

[18]

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.

[19]

Y. Li, H. Wen, W. Wang, X. Li, Y. Yuan, G. Liu, J. Liu, W. Xu, X. Wang, Y. Sun, et al., Personal LLM agents: Insights and survey about the capability, efficiency and security, 2024, arXiv preprint arXiv:2401.05459.

[20]

Y. Chang, X. Wang, J. Wang, Y. Wu, K. Zhu, H. Chen, L. Yang, X. Yi, C. Wang, Y. Wang, et al., A survey on evaluation of large language models, 2023, arXiv preprint arXiv:2307.03109.

[21]

W.X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al., A survey of large language models, 2023, arXiv preprint arXiv:2303.18223.

[22]

J. Wu, S. Yang, R. Zhan, Y. Yuan, D.F. Wong, L.S. Chao, A survey on LLM-gernerated text detection: Necessity, methods, and future directions, 2023, arXiv preprint arXiv:2310.14724.

[23]

S.R. Bowman, Eight things to know about large language models, 2023, arXiv preprint arXiv:2304.00612.

[24]

B. Gulzar, S.A. Sofi, S. Sholla, Exploring personalized internet of things (PIoT), social connectivity, and artificial social intelligence (ASI): A survey, High- Confid. Comput. (2024) 100242.

[25]

Z. Liu, Y. Bao, S. Zeng, R. Qian, M. Deng, A. Gu, J. Li, W. Wang, W. Cai, W. Li, et al., Large language models in psychiatry: Current applications, limitations, and future scope, Big Data Min. Anal. 7 (4) (2024) 1148-1168.

[26]

J. Xie, Y. Zhang, H. Kou, X. Zhao, Z. Feng, L. Song, W. Zhong, A survey of the application of neural networks to event extraction, Tsinghua Sci. Technol. 30 (2) (2025) 748-768, http://dx.doi.org/10.26599/TST.2023.9010139.

[27]

X. Liu, Y. He, W. Tai, X. Xu, F. Zhou, G. Luo, Exploring the chameleon effect of contextual dynamics in temporal knowledge graph for event prediction, Tsinghua Sci. Technol. 30 (1) (2025) 433-455, http://dx.doi.org/10.26599/TST.2024.9010067.

[28]

H. Naveed, A.U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Barnes, A. Mian, A comprehensive overview of large language models, 2023, arXiv preprint arXiv:2307.06435.

[29]

M.U. Hadi, R. Qureshi, A. Shah, M. Irfan, A. Zafar, M.B. Shaikh, N. Akhtar, J. Wu, S. Mirjalili, et al., A survey on large language models: Applications, challenges, limitations, and practical usage, 2023, Authorea Preprints.

[30]

Z. Guo, R. Jin, C. Liu, Y. Huang, D. Shi, L. Yu, Y. Liu, J. Li, B. Xiong, D. Xiong, et al., Evaluating large language models: A comprehensive survey, 2023, arXiv preprint arXiv:2310.19736.

[31]

Y. Liu, Y. Yao, J.-F. Ton, X. Zhang, R.G.H. Cheng, Y. Klochkov, M.F. Taufiq, H. Li, Trustworthy LLMs: a survey and guideline for evaluating large language models’ alignment, 2023, arXiv preprint arXiv:2308.05374.

[32]

Z. Yuan, Y. Shang, Y. Zhou, Z. Dong, C. Xue, B. Wu, Z. Li, Q. Gu, Y.J. Lee, Y. Yan, et al., Llm inference unveiled: Survey and roofline model insights, 2024, arXiv preprint arXiv:2402.16363.

[33]

T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N.V. Chawla, O. Wiest, X. Zhang, Large language model based multi-agents: A survey of progress and challenges, 2024, arXiv preprint arXiv:2402.01680.

[34]

P. Feng, Y. He, G. Huang, Y. Lin, H. Zhang, Y. Zhang, H. Li, AGILE: A novel framework of LLM agents, 2024, arXiv preprint arXiv:2405.14751.

[35]

H. Liu, Cooperative multi-agent game based on reinforcement learning, High- Confid. Comput. 4 (1) (2024) 100205.

[36]

Z. Zhang, X. Bo, C. Ma, R. Li, X. Chen, Q. Dai, J. Zhu, Z. Dong, J.R. Wen, A survey on the memory mechanism of large language model based agents, 2024, arXiv preprint arXiv:2404.13501.

[37]

Y. Cheng, C. Zhang, Z. Zhang, X. Meng, S. Hong, W. Li, Z. Wang, Z. Wang, F. Yin, J. Zhao, et al., Exploring large language model based intelligent agents: Definitions, methods, and prospects, 2024, arXiv preprint arXiv: 2401.03428.

[38]

H. Li, Y. Chen, J. Luo, Y. Kang, X. Zhang, Q. Hu, C. Chan, Y. Song, Privacy in large language models: Attacks, defenses and future directions, 2023, arXiv preprint arXiv:2310.10383.

[39]

S. Neel, P. Chang, Privacy issues in large language models: A survey, 2023, arXiv preprint arXiv:2312.06717.

[40]

J. Marshall, What effects do large language models have on cybersecurity, 2023.

[41]

M. Al-Hawawreh, A. Aljuhani, Y. Jararweh, Chatgpt for cybersecurity: practical applications, challenges, and future directions, Clust. Comput. 26 (6) (2023) 3421-3436.

[42]

A. Qammar, H. Wang, J. Ding, A. Naouri, M. Daneshmand, H. Ning, Chatbots to ChatGPT in a cybersecurity space: Evolution, vulnerabilities, attacks, challenges, and future recommendations, 2023, arXiv preprint arXiv:2306.09255.

[43]

L. Schwinn, D. Dobre, S. Günnemann, G. Gidel, Adversarial attacks and defenses in large language models: Old and new threats, 2023, arXiv preprint arXiv:2310.19737.

[44]

E. Derner, K. Batistič, Beyond the safeguards: Exploring the security risks of ChatGPT, 2023, arXiv preprint arXiv:2305.08005.

[45]

E. Shayegani, M.A.A. Mamun, Y. Fu, P. Zaree, Y. Dong, N. Abu-Ghazaleh, Survey of vulnerabilities in large language models revealed by adversarial attacks, 2023, arXiv preprint arXiv:2310.10844.

[46]

S. Wang, T. Zhu, B. Liu, D. Ming, X. Guo, D. Ye, W. Zhou, Unique security and privacy threats of large language model: A comprehensive survey, 2024, arXiv preprint arXiv:2406.07973.

[47]

K. Li, S. Zhuang, Y. Zhang, M. Xu, R. Wang, K. Xu, X. Fu, X. Cheng, I’m Spartacus, no, I’m Spartacus: Measuring and understanding LLM identity confusion, 2024, arXiv preprint arXiv:2411.10683.

[48]

H. Kibriya, W.Z. Khan, A. Siddiqa, M.K. Khan, Privacy issues in large language models: A survey, Comput. Electr. Eng. 120 (2024) 109698.

[49]

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, et al., A survey on large language model based autonomous agents, Front. Comput. Sci. 18 (6) (2024) 186345.

[50]

B. Yan, K. Li, M. Xu, Y. Dong, Y. Zhang, Z. Ren, X. Cheng, On protecting the data privacy of large language models (llms): A survey, 2024, arXiv preprint arXiv:2403.05156.

[51]

F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, P.S. Yu, The emerged security and privacy of llm agent: A survey with case studies, 2024, arXiv preprint arXiv:2407.19354.

[52]

Y. Wang, Y. Pan, Q. Zhao, Y. Deng, Z. Su, L. Du, T.H. Luan, Large model agents: State-of-the-art, cooperation paradigms, security and privacy, and future trends, 2024, arXiv preprint arXiv:2409.14457.

[53]

Z. Deng, Y. Guo, C. Han, W. Ma, J. Xiong, S. Wen, Y. Xiang, AI agents under threat: A survey of key security challenges and future pathways, 2024, arXiv preprint arXiv:2406.02630.

[54]

Z. Zhang, B. Guo, T. Li, Can humans oversee agents to prevent privacy leakage? A study on privacy awareness, preferences, and trust in language model agents, 2024, arXiv preprint arXiv:2411.01344.

[55]

Y. Cai, S. Mao, W. Wu, Z. Wang, Y. Liang, T. Ge, C. Wu, W. You, T. Song, Y. Xia, et al., Low-code LLM: Visual programming over LLMs, 2023, arXiv preprint arXiv:2304.08103.

[56]

M. Karpinska, M. Iyyer, Large language models effectively leverage document-level context for literary translation, but critical errors persist, 2023, arXiv preprint arXiv:2304.03245.

[57]

A.J. Thirunavukarasu, D.S.J. Ting, K. Elangovan, L. Gutierrez, T.F. Tan, D.S.W. Ting, Large language models in medicine, Nature Med. 29 (8) (2023) 1930-1940.

[58]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).

[59]

T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D.M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot learners, 2020, arXiv:2005.14165.

[60]

S.Y. Gadre, G. Ilharco, A. Fang, J. Hayase, G. Smyrnis, T. Nguyen, R. Marten, M. Wortsman, D. Ghosh, J. Zhang, et al., Datacomp: In search of the next generation of multimodal datasets, Adv. Neural Inf. Process. Syst. 36 (2024).

[61]

N. Kshetri, Cybercrime and privacy threats of large language models, IT Prof. 25 (3) (2023) 9-13.

[62]

J. Zamfirescu-Pereira, R.Y. Wong, B. Hartmann, Q. Yang, Why johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts,in: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023, pp. 1-21.

[63]

U. Iqbal, T. Kohno, F. Roesner,LLM platform security: Applying a sys-tematic evaluation framework to openai’s ChatGPT plugins, 2023, arXiv preprint arXiv:2309.10254.

[64]

Z. Zhang, M. Jia, H. Lee, B. Yao, S. Das, A. Lerner, T. Li, ‘‘ It’sa fair game’’, or is it? Examining how users navigate disclosure risks and benefits when using LLM-based conversational agents, 2024.

[65]

E. Bagdasaryan, R. Yi, S. Ghalebikesabi, P. Kairouz, M. Gruteser, S. Oh, B. Balle, D. Ramage, Air gap: Protecting privacy-conscious conversational agents, 2024, arXiv preprint arXiv:2405.05175.

[66]

X. Li, F. Huang, J. Lv, Z. Xiao, G. Li, Y. Yue, Be more real: Travel diary generation using LLM agents and individual profiles, 2024, arXiv preprint arXiv:2407.18932.

[67]

H. Lyu, S. Jiang, H. Zeng, Y. Xia, J. Luo, Llm-rec: Personalized recom-mendation via prompting large language models, 2023, arXiv preprint arXiv:2307.15780.

[68]

J. Harte, W. Zorgdrager, P. Louridas, A. Katsifodimos, D. Jannach, M. Fragk-oulis, Leveraging large language models for sequential recommendation, in:Proceedings of the 17th ACM Conference on Recommender Systems, 2023, pp. 1096-1102.

[69]

G. Gao, A. Taymanov, E. Salinas, P. Mineiro, D. Misra, Aligning llm agents by learning latent preference from user edits, 2024, arXiv preprint arXiv: 2404.15269.

[70]

L. Li, D. Song, X. Li, J. Zeng, R. Ma, X. Qiu, Backdoor attacks on pre-trained models by layerwise weight poisoning, 2021, arXiv preprint arXiv: 2108.13888.

[71]

W. Yang, L. Li, Z. Zhang, X. Ren, X. Sun, B. He, Be careful about poisoned word embeddings: Exploring the vulnerability of the embedding layers in nlp models, 2021, arXiv preprint arXiv:2103.15543.

[72]

H. Yao, J. Lou, Z. Qin, Poisonprompt: Backdoor attack on prompt-based large language models, 2023, arXiv preprint arXiv:2310.12439.

[73]

H. Huang, Z. Zhao, M. Backes, Y. Shen, Y. Zhang, Composite backdoor attacks against large language models, 2023, arXiv preprint arXiv:2310. 07676.

[74]

A. Wan, E. Wallace, S. Shen, D. Klein, Poisoning language models during instruction tuning, 2023, arXiv preprint arXiv:2305.00944.

[75]

J. Xu, M.D. Ma, F. Wang, C. Xiao, M. Chen, Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models, 2023, arXiv preprint arXiv:2305.14710.

[76]

J. Yan, V. Yadav, S. Li, L. Chen, Z. Tang, H. Wang, V. Srinivasan, X. Ren, H. Jin, Backdooring instruction-tuned large language models with virtual prompt injection, in:NeurIPS 2023 Workshop on Backdoors in Deep Learning-the Good, the Bad, and the Ugly, 2023.

[77]

T. Dong, M. Xue, G. Chen, R. Holland, S. Li, Y. Meng, Z. Liu, H. Zhu, The philosopher’s stone: Trojaning plugins of large language models, 2023, arXiv preprint arXiv:2312.00374.

[78]

Y. Wang, D. Xue, S. Zhang, S. Qian, BadAgent: Inserting and activating backdoor attacks in LLM agents, 2024, arXiv preprint arXiv:2406.03007.

[79]

W. Yang, X. Bi, Y. Lin, S. Chen, J. Zhou, X. Sun, Watch out for your agents!Investigating backdoor threats to llm-based agents, 2024, arXiv preprint arXiv:2402.11208.

[80]

E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, T. Lanham, D.M. Ziegler, T. Maxwell, N. Cheng, et al., Sleeper agents: Training deceptive llms that persist through safety training, 2024, arXiv preprint arXiv:2401.05566.

[81]

H. Zhang, J. Huang, K. Mei, Y. Yao, Z. Wang, C. Zhan, H. Wang, Y. Zhang, Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents, 2024, arXiv preprint arXiv:2410.02644.

[82]

R. Shokri, M. Stronati, C. Song, V. Shmatikov, Membership inference attacks against machine learning models, in: 2017 IEEE Symposium on Security and Privacy, SP, IEEE, 2017, pp. 3-18.

[83]

H. Huang, W. Luo, G. Zeng, J. Weng, Y. Zhang, A. Yang, Damia: leveraging domain adaptation as a defense against membership inference attacks, IEEE Trans. Dependable Secur. Comput. 19 (5) (2021) 3183-3199.

[84]

F. Mireshghallah, K. Goyal, A. Uniyal, T. Berg-Kirkpatrick, R. Shokri, Quantifying privacy risks of masked language models using membership inference attacks, 2022, arXiv preprint arXiv:2203.03929.

[85]

J. Mattern, F. Mireshghallah, Z. Jin, B. Schölkopf, M. Sachan, T. Berg-Kirkpatrick, Membership inference attacks against language models via neighbourhood comparison, 2023, arXiv preprint arXiv:2305.18462.

[86]

W. Shi, A. Ajith, M. Xia, Y. Huang, D. Liu, T. Blevins, D. Chen, L. Zettlemoyer, Detecting pretraining data from large language models, 2023, arXiv preprint arXiv:2310.16789.

[87]

M. Duan, A. Suri, N. Mireshghallah, S. Min, W. Shi, L. Zettlemoyer, Y. Tsvetkov, Y. Choi, D. Evans, H. Hajishirzi, Do membership inference attacks work on large language models? 2024, arXiv preprint arXiv: 2402.07841.

[88]

F. Mireshghallah, A. Uniyal, T. Wang, D.K. Evans, T. Berg-Kirkpatrick, An empirical analysis of memorization in fine-tuned autoregressive language models, in:Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 1816-1826.

[89]

A. Jagannatha, B.P.S. Rawat, H. Yu, Membership inference attack sus-ceptibility of clinical language models, 2021, arXiv preprint arXiv:2104. 08305.

[90]

W. Fu, H. Wang, C. Gao, G. Liu, Y. Li, T. Jiang, Practical membership in-ference attacks against fine-tuned large language models via self-prompt calibration, 2023, arXiv preprint arXiv:2311.06062.

[91]

C. Song, A. Raghunathan, Information leakage in embedding models, in:Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 377-390.

[92]

N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al., Extracting training data from large language models,in:30th USENIX Security Symposium, USENIX Security 21, 2021, pp. 2633-2650.

[93]

E. Lehman, S. Jain, K. Pichotta, Y. Goldberg, B.C. Wallace, Does BERT pretrained on clinical notes reveal sensitive data? 2021, arXiv preprint arXiv:2104.07762.

[94]

R. Zhang, S. Hidano, F. Koushanfar, Text revealer: Private text recon-struction via model inversion attacks against transformers, 2022, arXiv preprint arXiv:2209.10505.

[95]

Y. Li, Z. Tan, Y. Liu, Privacy-preserving prompt tuning for large language model services, 2023, arXiv preprint arXiv:2305.06212.

[96]

X. Pan, M. Zhang, S. Ji, M. Yang, Privacy risks of general-purpose language models, in: 2020 IEEE Symposium on Security and Privacy, SP, IEEE, 2020, pp. 1314-1331.

[97]

K. Krishna, G.S. Tomar, A.P. Parikh, N. Papernot, M. Iyyer, Thieves on sesame street! model extraction of bert-based apis, 2019, arXiv preprint arXiv:1910.12366.

[98]

J.B. Truong, P. Maini, R.J. Walls, N. Papernot, Data-free model extraction, in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4771-4780.

[99]

Z. Sha, Y. Zhang, Prompt stealing attacks against large language models, 2024, arXiv preprint arXiv:2402.12959.

[100]

X. Li, R. Wang, M. Cheng, T. Zhou, C.J. Hsieh, Drattack: Prompt decom-position and reconstruction makes powerful llm jailbreakers, 2024, arXiv preprint arXiv:2402.16914.

[101]

Y. Zhou, Z. Huang, F. Lu, Z. Qin, W. Wang, Don’t say no: Jailbreaking LLM by suppressing refusal, 2024, arXiv preprint arXiv:2404.16369.

[102]

J. Chu, Y. Liu, Z. Yang, X. Shen, M. Backes, Y. Zhang, Comprehensive assessment of jailbreak attacks against llms, 2024, arXiv preprint arXiv: 2402.05668.

[103]

X. Guo, F. Yu, H. Zhang, L. Qin, B. Hu, Cold-attack: Jailbreaking llms with stealthiness and controllability, 2024, arXiv preprint arXiv:2402.08679.

[104]

X. Wang, J. Peng, K. Xu, H. Yao, T. Chen, Reinforcement learning-driven LLM agent for automated attacks on LLMs, in:Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, 2024, pp. 170-177.

[105]

Y. Dong, Z. Li, X. Meng, N. Yu, S. Guo, Jailbreaking text-to-image models with LLM-based agents, 2024, arXiv preprint arXiv:2408.00523.

[106]

Z. Zhong, Z. Huang, A. Wettig, D. Chen, Poisoning retrieval corpora by injecting adversarial passages, 2023, arXiv preprint arXiv:2310.19156.

[107]

W. Zou, R. Geng, B. Wang, J. Jia, Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models, 2024, arXiv preprint arXiv:2402.07867.

[108]

Y. Yu, H. Li, Z. Chen, Y. Jiang, Y. Li, D. Zhang, R. Liu, J.W. Suchow, K. Khashanah, FinMem: A performance-enhanced LLM trading agent with layered memory and character design,in: Proceedings of the AAAI Symposium Series, vol. 3, (no. 1) 2024, pp. 595-597.

[109]

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C.L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, R. Lowe, Training language models to follow instructions with human feedback, 2022, arXiv:2203.02155.

[110]

Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, D. Drain, S. Fort, D. Ganguli, T. Henighan, N. Joseph, S. Kadavath, J. Kernion, T. Conerly, S. El-Showk, N. Elhage, Z. Hatfield-Dodds, D. Hernandez, T. Hume, S. Johnston, S. Kravec, L. Lovitt, N. Nanda, C. Olsson, D. Amodei, T. Brown, J. Clark, S. McCandlish, C. Olah, B. Mann, J. Kaplan, Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022, arXiv:2204.05862.

[111]

Y. Bai, S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Chen, C. Olsson, C. Olah, D. Hernandez, D. Drain, D. Ganguli, D. Li, E. Tran-Johnson, E. Perez, J. Kerr, J. Mueller, J. Ladish, J. Landau, K. Ndousse, K. Lukosuite, L. Lovitt, M. Sellitto, N. Elhage, N. Schiefer, N. Mercado, N. DasSarma, R. Lasenby, R. Larson, S. Ringer, S. Johnston, S. Kravec, S.E. Showk, S. Fort, T. Lanham, T. Telleen-Lawton, T. Conerly, T. Henighan, T. Hume, S.R. Bowman, Z. Hatfield-Dodds, B. Mann, D. Amodei, N. Joseph, S. McCandlish, T. Brown, J. Kaplan, Constitutional AI: Harmlessness from AI feedback, 2022, arXiv: 2212.08073.

[112]

N. Kandpal, E. Wallace, C. Raffel, Deduplicating training data mitigates privacy risks in language models, 2022, arXiv:2202.06539.

[113]

C. Chen, X. Feng, J. Zhou, J. Yin, X. Zheng, Federated large language model: A position paper, 2023, arXiv:2307.08925.

[114]

S. Yu, J.P. Muñoz, A. Jannesari, Federated foundation models: Privacy-preserving and collaborative learning for large models, 2023, arXiv:2305. 11414.

[115]

S. Hoory, A. Feder, A. Tendler, S. Erell, A. Peled-Cohen,I. Laish, H. Nakhost,

[116]

U. Stemmer, A. Benjamini, A. Hassidim, et al., Learning and evaluat-ing a differentially private pre-trained language model, in: Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 1178-1189.

[117]

J. Du, H. Mi, DP-fp: Differentially private forward propagation for large models, 2021, arXiv preprint arXiv:2112.14430.

[118]

X. Li, F. Tramer, P. Liang, T. Hashimoto, Large language models can be strong differentially private learners, 2021, arXiv preprint arXiv:2110. 05679.

[119]

M. Xu, D. Cai, Y. Wu, X. Li, S. Wang, Fwdllm: Efficient fedllm using forward gradient, 2024, arXiv:2308.13894.

[120]

J. Zhang, S. Vahidian, M. Kuo, C. Li, R. Zhang, T. Yu, Y. Zhou, G. Wang, Y. Chen, Towards building the federated GPT: Federated instruction tuning, 2024, arXiv:2305.05644.

[121]

J. Sun, Z. Xu, H. Yin, D. Yang, D. Xu, Y. Chen, H.R. Roth, FedBPT: Efficient federated black-box prompt tuning for large language models, 2023, arXiv:2310.01467.

[122]

T. Fan, Y. Kang, G. Ma, W. Chen, W. Wei, L. Fan, Q. Yang, FATE-LLM: A industrial grade federated learning framework for large language models, 2023, arXiv:2310.10049.

[123]

W. Kuang, B. Qian, Z. Li, D. Chen, D. Gao, X. Pan, Y. Xie, Y. Li, B. Ding, J. Zhou, Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning,in: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 5260-5271.

[124]

F. Wu, Z. Li, Y. Li, B. Ding, J. Gao, Fedbiot: Llm local fine-tuning in federated learning without full model,in: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 3345-3355.

[125]

F. Wu, X. Liu, H. Wang, X. Wang, J. Gao, On the client preference of LLM fine-tuning in federated learning, 2024, arXiv preprint arXiv:2407.03038.

[126]

R. Behnia, M.R. Ebrahimi, J. Pacheco, B. Padmanabhan, EW-tune: A framework for privately fine-tuning large language models with differ-ential privacy, in: 2022 IEEE International Conference on Data Mining Workshops, ICDMW, IEEE, 2022, pp. 560-566.

[127]

W. Shi, R. Shea, S. Chen, C. Zhang, R. Jia, Z. Yu, Just fine-tune twice: Selective differential privacy for large language models, 2022, arXiv preprint arXiv:2204.07667.

[128]

X. Wu, L. Gong, D. Xiong, Adaptive differential privacy for language model training, in:Proceedings of the First Workshop on Federated Learning for Natural Language Processing, FL4NLP 2022, 2022, pp. 21-26.

[129]

J. Majmudar, C. Dupuy, C. Peris, S. Smaili, R. Gupta, R. Zemel, Differentially private decoding in large language models, 2022, arXiv preprint arXiv: 2205.13621.

[130]

M. Du, X. Yue, S.S. Chow, T. Wang, C. Huang, H. Sun, DP-forward: Fine-tuning and inference on language models with differential privacy in forward pass,in: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, 2023, pp. 2665-2679.

[131]

P. Mai, R. Yan, Z. Huang, Y. Yang, Y. Pang, Split-and-denoise: Protect large language model inference with local differential privacy, 2023, arXiv preprint arXiv:2310.09130.

[132]

S. Liu, Y. Yao, J. Jia, S. Casper, N. Baracaldo, P. Hase, X. Xu, Y. Yao, H. Li, K.R. Varshney, M. Bansal, S. Koyejo, Y. Liu, Rethinking machine unlearning for large language models, 2024, arXiv:2402.08787.

[133]

D. Zhang, P. Finckenberg-Broman, T. Hoang, S. Pan, Z. Xing, M. Staples, X. Xu, Right to be forgotten in the era of large language models: Implications, challenges, and solutions, 2023, arXiv:2307.03941.

[134]

J. Chen, D. Yang, Unlearn what you want to forget: Efficient unlearning for LLMs, 2023, arXiv:2310.20150.

[135]

J. Jang, D. Yoon, S. Yang, S. Cha, M. Lee, L. Logeswaran, M. Seo, Knowledge unlearning for mitigating privacy risks in language models, 2022, arXiv: 2210.01504.

[136]

R. Eldan, M. Russinovich, Who’s harry potter? Approximate unlearning in LLMs, 2023, arXiv:2310.02238.

[137]

G. Xiao, J. Lin, S. Han, Offsite-tuning: Transfer learning without full model, 2023, arXiv:2302.04870.

[138]

T. Chen, H. Bao, S. Huang, L. Dong, B. Jiao, D. Jiang, H. Zhou, J. Li, F. Wei, The-x: Privacy-preserving transformer inference with homomorphic encryption, 2022, arXiv preprint arXiv:2206.00216.

[139]

M. Hao, H. Li, H. Chen, P. Xing, G. Xu, T. Zhang, Iron: Private inference on transformers, Adv. Neural Inf. Process. Syst. 35 (2022) 15718-15731.

[140]

W.j. Lu, Z. Huang, Z. Gu, J. Li, J. Liu, K. Ren, C. Hong, T. Wei, W. Chen, Bum-bleBee: Secure two-party inference framework for large transformers, 2023, Cryptology ePrint Archive.

[141]

I. Zimerman, M. Baruch, N. Drucker, G. Ezov, O. Soceanu, L. Wolf,Converting transformers to polynomial form for secure inference over homomorphic encryption, 2023, arXiv preprint arXiv:2311.08610.

[142]

X. Liu, Z. Liu, LLMs can understand encrypted prompt: Towards privacy-computing friendly transformers, 2023, arXiv preprint arXiv:2305. 18396.

[143]

Y. Wang, G.E. Suh, W. Xiong, B. Lefaudeux, B. Knott, M. Annavaram, H.H.S. Lee, Characterization of mpc-based private inference for transformer-based models, in: 2022 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS, IEEE, 2022, pp. 187-197.

[144]

X. Hou, J. Liu, J. Li, Y. Li, W.-j. Lu, C. Hong, K. Ren, Ciphergpt: Secure two-party gpt inference, 2023, Cryptology ePrint Archive.

[145]

Y. Ding, H. Guo, Y. Guan, W. Liu, J. Huo, Z. Guan, X. Zhang, East: Efficient and accurate secure transformer framework for inference, 2023, arXiv preprint arXiv:2308.09923.

[146]

Y. Akimoto, K. Fukuchi, Y. Akimoto, J. Sakuma, Privformer: Privacy-preserving transformer with mpc, in: 2023 IEEE 8th European Symposium on Security and Privacy, EuroS&P, IEEE, 2023, pp. 392-410.

[147]

Y. Dong, W.-j. Lu, Y. Zheng, H. Wu, D. Zhao, J. Tan, Z. Huang, C. Hong, T. Wei, W. Cheng, Puma: Secure inference of llama-7b in five minutes, 2023, arXiv preprint arXiv:2307.12533.

[148]

K. Gupta, N. Jawalkar, A. Mukherjee, N. Chandran, D. Gupta, A. Panwar, R. Sharma, SIGMA: secure GPT inference with function secret sharing, 2023, Cryptology ePrint Archive.

[149]

X. Zhou, J. Lu, T. Gui, R. Ma, Z. Fei, Y. Wang, Y. Ding, Y. Cheung, Q. Zhang, X. J. Huang, TextFusion: Privacy-preserving pre-trained model inference via token fusion,in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 8360-8371.

[150]

M. Yuan, L. Zhang, X.Y. Li, Secure transformer inference, 2023, arXiv preprint arXiv:2312.00025.

[151]

B. Li, D. Micciancio, On the security of homomorphic encryption on approximate numbers, in:Advances in Cryptology-EUROCRYPT 2021: 40th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Zagreb, Croatia, October 17-21, 2021, Proceedings, Part I 40, Springer, 2021, pp. 648-677.

[152]

A. Acar, H. Aksu, A.S. Uluagac, M. Conti, A survey on homomorphic encryption schemes: Theory and implementation, ACM Comput. Surv.(Csur) 51 (4) (2018) 1-35.

[153]

O. Goldreich, Secure multi-party computation, Manuscr. Prelim. Version 78 (110) (1998) 1-108.

[154]

C. Dong, J. Weng, J. Liu, Y. Zhang, Y. Tong, A. Yang, Y. Cheng, S. Hu, Fusion: Efficient and secure inference resilient to malicious servers,in: 30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, February 27 - March 3, 2023, The Internet Society, 2023.

[155]

E. Boyle, N. Gilboa, Y. Ishai, Function secret sharing, in: Annual Inter-national Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2015, pp. 337-367.

[156]

N. Lukas, A. Salem, R. Sim, S. Tople, L. Wutschitz, S. Zanella-Béguelin, Analyzing leakage of personally identifiable information in language models, 2023, arXiv:2302.00539.

[157]

C. Brown, C. Morisset, Simple and efficient identification of personally identifiable information on a public website, in: 2022 IEEE International Conference on Big Data, Big Data, IEEE, 2022, pp. 4246-4255.

[158]

X. Wu, J. Li, M. Xu, W. Dong, S. Wu, C. Bian, D. Xiong, Depn: Detecting and editing privacy neurons in pretrained language models, 2023, arXiv preprint arXiv:2310.20138.

[159]

Y. Shvartzshnaider, Z. Pavlinovic, A. Balashankar, T. Wies, L. Subramanian, H. Nissenbaum, P. Mittal, Vaccine: Using contextual integrity for data leakage detection,in: The World Wide Web Conference, 2019, pp. 1702-1712.

[160]

D. Glukhov, I. Shumailov, Y. Gal, N. Papernot, V. Papyan, LLM censorship: A machine learning challenge or a computer security problem? 2023, arXiv:2307.10719.

[161]

S. Kim, S. Yun, H. Lee, M. Gubri, S. Yoon, S.J. Oh, Propile: Probing privacy leakage in large language models, 2023, arXiv:2307.01881.

[162]

M. Phute, A. Helbling, M. Hull, S. Peng, S. Szyller, C. Cornelius, D.H. Chau, LLM self defense: By self examination, LLMs know they are being tricked, 2023, arXiv:2308.07308.

[163]

B. Chen, A. Paliwal, Q. Yan, Jailbreaker in jail: Moving target defense for large language models, 2023, arXiv:2310.02417.

[164]

N. Mireshghallah, H. Kim, X. Zhou, Y. Tsvetkov, M. Sap, R. Shokri, Y. Choi, Can LLMs keep a secret? Testing privacy implications of language models via contextual integrity theory, 2023, arXiv:2310.17884.

[165]

J. Huang, H. Shao, K.C.C. Chang, Are large pre-trained language models leaking your personal information? 2022, arXiv:2205.12628.

[166]

Y. Wang, Y. Lin, X. Zeng, G. Zhang, PrivateLoRA for efficient privacy preserving LLM, 2023, arXiv preprint arXiv:2311.14030.

[167]

H. Chen, H.H. Chen, M. Sun, K. Li, Z. Chen, X. Wang, A verified confidential computing as a service framework for privacy preservation, in: 32nd USENIX Security Symposium, USENIX Security 23, 2023, pp. 4733-4750.

[168]

J. Zhu, R. Hou, X. Wang, W. Wang, J. Cao, B. Zhao, Z. Wang, Y. Zhang, J. Ying, L. Zhang, et al., Enabling rack-scale confidential computing using heterogeneous trusted execution environment, in: 2020 IEEE Symposium on Security and Privacy, SP, IEEE, 2020, pp. 1450-1465.

[169]

C. Liu, H. Guo, M. Xu, S. Wang, D. Yu, J. Yu, X. Cheng, Extending on-chain trust to off-chain - Trustworthy blockchain data collection using trusted execution environment (TEE), IEEE Trans. Comput. 71 (12) (2022) 3268-3280, http://dx.doi.org/10.1109/TC.2022.3148379.

[170]

R. Li, Q. Wang, Q. Wang, D. Galindo, M. Ryan, SoK: TEE-assisted confidential smart contract, 2022, arXiv preprint arXiv:2203.08548.

[171]

L. Luo, Y. Zhang, C. White, B. Keating, B. Pearson, X. Shao, Z. Ling, H. Yu, C. Zou, X. Fu, On security of trustzone-m-based iot systems, IEEE Internet Things J. 9 (12) (2022) 9683-9699.

[172]

J. Weng, S. Zhijian, Y. Zhang, M. Li, W. Jiasi, Y. Wu, L. Weiqi, Peripheral-free secure pairing protocol by randomly switching power, 2022, US Patent 11, 265, 722.

[173]

K. Liu, M. Yang, Z. Ling, H. Yan, Y. Zhang, X. Fu, W. Zhao, On manually reverse engineering communication protocols of Linux-based IoT systems, IEEE Internet Things J. 8 (8) (2020) 6815-6827.

[174]

B. Pearson, C. Zou, Y. Zhang, Z. Ling, X. Fu, SIC 2: Securing microcontroller based IoT devices with low-cost crypto coprocessors, in: 2020 IEEE 26th International Conference on Parallel and Distributed Systems, ICPADS, IEEE, 2020, pp. 372-381.

[175]

G. Dhanuskodi, S. Guha, V. Krishnan, A. Manjunatha, M. O’Connor, R. Nertney, P. Rogers, Creating the first confidential GPUs: The team at NVIDIA brings confidentiality and integrity to user code and data for accelerated computing, Queue 21 (4) (2023) 68-93.

[176]

T. South, G. Zuskind, R. Mahari, T. Hardjono, Secure Community Transformers: Private Pooled Data for LLMs.

[177]

W. Huang, Y. Wang, A. Cheng, A. Zhou, C. Yu, L. Wang, A fast, performant, secure distributed training framework for large language model, 2024, arXiv preprint arXiv:2401.09796.

[178]

R. Grabler, M. Hirschmanner, H.A. Frijns, S.T. Koeszegi, Privacy agents: Utilizing large language models to safeguard contextual integrity in elderly care, Parameters 4 (28) (2024) 37.

[179]

X. Zhang, H. Xu, Z. Ba, Z. Wang, Y. Hong, J. Liu, Z. Qin, K. Ren, Privacyasst: Safeguarding user privacy in tool-using large language model agents, IEEE Trans. Dependable Secur. Comput. (2024).

[180]

Y. He, E. Wang, Y. Rong, Z. Cheng, H. Chen, Security of AI agents, 2024, arXiv preprint arXiv:2406.08689.

[181]

W. Hua, X. Yang, M. Jin, Z. Li, W. Cheng, R. Tang, Y. Zhang, Trustagent: Towards safe and trustworthy llm-based agents through agent consti-tution, in: Trustworthy Multi-Modal Foundation Models and AI Agents, TiFA, 2024.

[182]

C. Song, L. Ma, J. Zheng, J. Liao, H. Kuang, L. Yang, Audit-LLM: Multi-agent collaboration for log-based insider threat detection, 2024, arXiv preprint arXiv:2408.08902.

[183]

Y. Zeng, Y. Wu, X. Zhang, H. Wang, Q. Wu, Autodefense: Multi-agent llm defense against jailbreak attacks, 2024, arXiv preprint arXiv:2403.04783.

[184]

J. Hong, Q. Tu, C. Chen, X. Gao, J. Zhang, R. Yan, Cyclealign: Iterative distillation from black-box llm to white-box models for better human alignment, 2023, arXiv preprint arXiv:2310.16271.

[185]

Y. Wang, X. Ma, W. Chen, Augmenting black-box llms with medical textbooks for clinical question answering, 2023, arXiv preprint arXiv: 2309.02233.

[186]

P. Chao, A. Robey, E. Dobriban, H. Hassani, G.J. Pappas, E. Wong, Jail-breaking black box large language models in twenty queries, 2023, arXiv preprint arXiv:2310.08419.

[187]

B. Li, R. Wang, G. Wang, Y. Ge, Y. Ge, Y. Shan, Seed-bench: Benchmarking multimodal llms with generative comprehension, 2023, arXiv preprint arXiv:2307.16125.

[188]

B. Meskó, The impact of multimodal large language models on health care’s future, J. Med. Internet Res. 25 (2023) e52865.

[189]

B. Huang, S. Yu, J. Li, Y. Chen, S. Huang, S. Zeng, S. Wang, FirewaLLM: A portable data protection and recovery framework for LLM services,in: International Conference on Data Mining and Big Data, Springer, 2023, pp. 16-30.

[190]

J. Evertz, M. Chlosta, L. Schönherr, T. Eisenhofer, Whispers in the machine: Confidentiality in LLM-integrated systems, 2024, arXiv preprint arXiv: 2402.06922.

AI Summary AI Mindmap
PDF (1483KB)

5476

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/