On large language models safety, security, and privacy: A survey

Ran Zhang , Hong-Wei Li , Xin-Yuan Qian , Wen-Bo Jiang , Han-Xiao Chen

Journal of Electronic Science and Technology ›› 2025, Vol. 23 ›› Issue (1) : 100301

PDF (1019KB)
Journal of Electronic Science and Technology ›› 2025, Vol. 23 ›› Issue (1) :100301 DOI: 10.1016/j.jnlest.2025.100301
research-article

On large language models safety, security, and privacy: A survey

Author information +
History +
PDF (1019KB)

Abstract

The integration of artificial intelligence (AI) technology, particularly large language models (LLMs), has become essential across various sectors due to their advanced language comprehension and generation capabilities. Despite their transformative impact in fields such as machine translation and intelligent dialogue systems, LLMs face significant challenges. These challenges include safety, security, and privacy concerns that undermine their trustworthiness and effectiveness, such as hallucinations, backdoor attacks, and privacy leakage. Previous works often conflated safety issues with security concerns. In contrast, our study provides clearer and more reasonable definitions for safety, security, and privacy within the context of LLMs. Building on these definitions, we provide a comprehensive overview of the vulnerabilities and defense mechanisms related to safety, security, and privacy in LLMs. Additionally, we explore the unique research challenges posed by LLMs and suggest potential avenues for future research, aiming to enhance the robustness and reliability of LLMs in the face of emerging threats.

Keywords

Large language models / Privacy issues / Safety issues / Security issues

Cite this article

Download citation ▾
Ran Zhang, Hong-Wei Li, Xin-Yuan Qian, Wen-Bo Jiang, Han-Xiao Chen. On large language models safety, security, and privacy: A survey. Journal of Electronic Science and Technology, 2025, 23(1): 100301 DOI:10.1016/j.jnlest.2025.100301

登录浏览全文

4963

注册一个新账户 忘记密码

CRediT authorship contribution statement

Ran Zhang: Writing–original draft, Methodology, Investigation, Formal analysis, Conceptualization. Hong-Wei Li: Writing–review & editing, Methodology, Project administration, Conceptualization, Funding acquisition. Xin-Yuan Qian: Writing–review & editing, Investigation, Formal analysis, Conceptualization. Wen-Bo Jiang: Writing–review & editing, Conceptualization, Resources, Investigation. Han-Xiao Chen: Supervision, Validation, Formal analysis, Methodology.

Declaration of competing interest

No potential conflict of interest was reported by the authors.

Acknowledgement

This work was supported by the National Key R&D Program of China under Grant No. 2022YFB3103500, the National Natural Science Foundation of China under Grants No. 62402087 and No. 62020106013, the Sichuan Science and Technology Program under Grant No. 2023ZYD0142, the Chengdu Science and Technology Program under Grant No. 2023-XT00-00002-GX, the Fundamental Research Funds for Chinese Central Universities under Grants No. ZYGX2020ZB027 and No. Y030232063003002, and the Postdoctoral Innovation Talents Support Program under Grant No. BX20230060.

References

[1]

OpenAI, J. Achiam, S. Adler, et al., GPT-4 technical report [Online]. Available, https://arxiv.org/abs/2303.08774, March 2023.

[2]

H. Touvron, T. Lavril, G. Izacard, et al., LLaMA: open and efficient foundation language models [Online]. Available, https://arxiv.org/abs/2302.13971, February 2023.

[3]

H. Touvron, L. Martin, K. Stone, et al., Llama 2: open foundation and fine-tuned chat models [Online]. Available, https://arxiv.org/abs/2307.09288, July 2023.

[4]

W.-X. Zhao, K. Zhou, J.-Y. Li, et al., A survey of large language models [Online]. Available, https://arxiv.org/abs/2303.18223, March 2023.

[5]

L. Huang, W.-J. Yu, W.-T. Ma, et al., A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions, ACM Trans. Inf. Syst., (2024),10.1145/3703155.

[6]

S. Zhao, H.-Z. Jia, Z.-L. Guo, et al., A survey of backdoor attacks and defenses on large language models: implications for security measures [Online]. Available, https://arxiv.org/abs/2406.06852, June 2024.

[7]

S. Kim, S. Yun, H. Lee, M. Gubri, S. Yoon, S.J. Oh, ProPILE: probing privacy leakage in large language models, in: Proc. of the 37th Intl. Conf. on Neural Information Processing Systems, New Orleans, USA, (2024), pp. 1-14.

[8]

T.-Y. Cui, Y.-L. Wang, C.-P. Fu, et al., Risk taxonomy, mitigation, and assessment benchmarks of large language model systems [Online]. Available, https://arxiv.org/abs/2401.05778, January 2024.

[9]

B.-W. Yan, K. Li, M.-H. Xu, et al., On protecting the data privacy of large language models (LLM): a survey [Online]. Available, https://arxiv.org/abs/2403.05156, March 2024.

[10]

Y.-F. Yao, J.-H. Duan, K.-D. Xu, Y.-F. Cai, Z.-B. Sun, Y. Zhang, A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly, High-Confid. Comput. 4 (2) (2024) 100211.

[11]

A. Vaswani, N. Shazeer, N. Parmar, et al., Attention is all you need, in: Proc. of the 31st Intl. Conf. on Neural Information Processing Systems, Long Beach, USA, (2017), pp. 6000-6010.

[12]

A. Deshpande, V. Murahari, T. Rajpurohit, A. Kalyan, K. Narasimhan, Toxicity in ChatGPT: analyzing persona-assigned language models, in: Proc. of the Findings of the Association for Computational Linguistics, Singapore, (2023), pp. 1236-1270.

[13]

A. Haim, A. Salinas, J. Nyarko, What’s in a name? Auditing large language models for race and gender bias [Online]. Available, https://arxiv.org/abs/2402.14875, February 2024.

[14]

Z.-W. Ji, N. Lee, R. Frieske, et al., Survey of hallucination in natural language generation, ACM Comput. Surv. 55 (12) (2023) 248.

[15]

Y. Zhang, Y.-F. Li, L.-Y. Cui, et al., Siren’s song in the AI ocean: a survey on hallucination in large language models [Online]. Available, https://arxiv.org/abs/2309.01219, September 2023.

[16]

X.-X. Li, R.-C. Zhao, Y.K. Chia, et al., Chain-of-knowledge: grounding large language models via dynamic knowledge adapting over heterogeneous sources, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-23.

[17]

A. Wei, N. Haghtalab, J. Steinhardt, Jailbroken: how does LLM safety training fail?, in: Proc. of the 37th Advances in Neural Information Processing Systems, New Orleans, USA, (2023), pp. 1-32.

[18]

Y.-Z. Li, T.-L. Li, K.-J. Chen, et al., BadEdit: backdooring large language models by model editing, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-18.

[19]

V.S. Sadasivan, S. Saha, G. Sriramanan, P. Kattakinda, A.M. Chegini, S. Feizi, Fast adversarial attacks on language models in one GPU minute, in: Proc. of the 41st Intl. Conf. on Machine Learning, Vienna, Austria, (2024), pp. 1-20.

[20]

V. Raina, A. Liusie, M.J.F. Gales, Is LLM-as-a-judge robust? Investigating universal adversarial attacks on zero-shot LLM assessment, in: Proc. of the Conf. on Empirical Methods in Natural Language Processing, Miami, USA, (2024), pp. 7499-7517.

[21]

N. Mireshghallah, H. Kim, X.-H. Zhou, et al., Can LLMs keep a secret? Testing privacy implications of language models via contextual integrity theory, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-24.

[22]

M. Duan, A. Suri, N. Mireshghallah, et al., Do membership inference attacks work on large language models? [Online]. Available, https://arxiv.org/abs/2402.07841, February 2024.

[23]

R. Staab, M. Vero, M. Balunovic, M.T. Vechev, Beyond memorization: violating privacy via inference with large language models, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-47.

[24]

A. Naseh, K. Krishna, M. Iyyer, A. Houmansadr, Stealing the decoding algorithms of language models, in: Proc. of the ACM SIGSAC Conf. on Computer and Communications Security, Copenhagen, Denmark, (2023), pp. 1835-1849.

[25]

A. Panda, C.A. Choquette-Choo, Z.-M. Zhang, Y.-Q. Yang, P. Mittal, Teach LLMs to phish: stealing private information from language models, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-25.

[26]

O. Shaikh, H.-X. Zhang, W. Held, M. Bernstein, D.-Y. Yang, On second thought, let’s not think step by step! Bias and toxicity in zero-shot reasoning, in: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, (2023), pp. 4454-4470.

[27]

X.-J. Dong, Y.-B. Wang, P.S. Yu, J. Caverlee, Probing explicit and implicit gender bias through LLM conditional text generation [Online]. Available, https://arxiv.org/abs/2311.00306, November 2023.

[28]

J. Welbl, A. Glaese, J. Uesato, et al., Challenges in detoxifying language models, in: Proc. of the Findings of the Association for Computational Linguistics, Punta Cana, Dominican Republic, (2021), pp. 2447-2469.

[29]

I.O. Gallegos, R.A. Rossi, J. Barrow, et al., Bias and fairness in large language models: a survey, Comput. Linguist 50 (3) (2024) 1097-1179.

[30]

M. Bhan, J.N. Vittaut, N. Achache, et al., Mitigating text toxicity with counterfactual generation [Online]. Available, https://arxiv.org/abs/2405.09948, May 2024.

[31]

N. Vishwamitra, K.-Y. Guo, F.T. Romit, et al., Moderating new waves of online hate with chain-of-thought reasoning in large language models, in: Proc. of the IEEE Symposium on Security and Privacy, San Francisco, USA, (2024), pp. 1-19.

[32]

A. Pal, L.K. Umapathi, M. Sankarasubbu, Med-HALT: medical domain hallucination test for large language models, in: Proc. of the 27th Conf. on Computational Natural Language Learning, Singapore, (2023), pp. 314-334.

[33]

H.-Q. Kang, X.-Y. Liu, Deficiency of large language models in finance: an empirical examination of hallucination, in: Proc. of the 37th Conf. on Neural Information Processing Systems, Virtual Event, (2023), pp. 1-15.

[34]

N. Mündler, J.-X. He, S. Jenko, M.-T. Vechev, Self-contradictory hallucinations of large language models: evaluation, detection and mitigation, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-30.

[35]

Y.-Z. Li, S. Bubeck, R. Eldan, A. Del Giorno, S. Gunasekar, Y.T. Lee, Textbooks are all you need II: phi-1.5 technical report [Online]. Available, https://arxiv.org/abs/2309.05463, September 2023.

[36]

H. Lightman, V. Kosaraju, Y. Burda, et al., Let’s verify step by step, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-24.

[37]

Z.-B. Gou, Z.-H. Shao, Y.-Y. Gong, et al., CRITIC: large language models can self-correct with tool-interactive critiquing, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-77.

[38]

S. Dhuliawala, M. Komeili, J. Xu, et al., Chain-of-verification reduces hallucination in large language models, in: Proc. of the Findings of the Association for Computational Linguistics, Bangkok, Thailand, (2024), pp. 3563-3578.

[39]

E. Jones, H. Palangi, C. Simões, et al., Teaching language models to hallucinate less with synthetic tasks, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-18.

[40]

S.M.T.I. Tonmoy, S.M.M. Zaman, V. Jain, et al., A comprehensive survey of hallucination mitigation techniques in large language models [Online] . Available, https://arxiv.org/abs/2401.01313, January 2024.

[41]

A. Zou, Z.-F. Wang, N. Carlini, M. Nasr, J.Z. Kolter, M. Fredrikson, Universal and transferable adversarial attacks on aligned language models [Online]. Available, https://arxiv.org/abs/2307.15043, July 2023.

[42]

H.-R. Li, D.-D. Guo, W. Fan, et al., Multi-step jailbreaking privacy attacks on ChatGPT, in: Proc. of the Findings of the Association for Computational Linguistics, Singapore, (2023), pp. 4138-4153.

[43]

M. Shanahan, K. McDonell, L. Reynolds, Role play with large language models, Nature 623 (7987) (2023) 493-498.

[44]

G.-L. Deng, Y. Liu, Y.-K. Li, et al., MASTERKEY: automated jailbreaking of large language model chatbots, in: Proc. of the 31st Annual Network and Distributed System Security Symposium, San Diego, USA, (2024), pp. 1-16.

[45]

Z.-X. Zhang, J.-X. Yang, P. Ke, et al., Safe unlearning: a surprisingly effective and generalizable solution to defend against jailbreak attacks [Online]. Available, https://arxiv.org/abs/2407.02855, July 2024.

[46]

W.-K. Lu, Z.-Q. Zeng, J.W. Wang, et al., Eraser: jailbreaking defense in large language models via unlearning harmful knowledge [Online]. Available, https://arxiv.org/abs/2404.05880.

[47]

A. Robey, E. Wong, H. Hassani, G.J. Pappas, SmoothLLM: defending large language models against jailbreaking attacks [Online]. Available, https://arxiv.org/abs/2310.03684, October 2023.

[48]

A. Zhou, B. Li, H.-H. Wang, Robust prompt optimization for defending language models against jailbreaking attacks, in: Proc. of the 38th Conf. on Neural Information Processing Systems, Virtual Event, (2024), pp. 1-17.

[49]

J.-S. Xu, M.-Y. Ma, F. Wang, C.-W. Xiao, M.-H. Chen, Instructions as backdoors: backdoor vulnerabilities of instruction tuning for large language models, in: Proc. of the Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, (2024), pp. 3111-3126.

[50]

A. Wan, E. Wallace, S. Shen, D. Klein, Poisoning language models during instruction tuning, in: Proc. of the 40th Intl. Conf. on Machine Learning, Honolulu, USA, (2023), pp. 1-13.

[51]

S. Zhao, H.-Z. Jia, L.A. Tuan, F.-J. Pan, J.-M. Wen, Universal vulnerabilities in large language models: backdoor attacks for in-context learning, in: Proc. of the Conf. on Empirical Methods in Natural Language Processing, Miami, USA, (2024), pp. 11507-11522.

[52]

Z.-H. Xi, T.-Y. Du, C.-J. Li, et al., Defending pre-trained language models as few-shot learners against backdoor attacks, in: Proc. of the 37th Advances in Neural Information Processing Systems, New Orleans, USA, (2023), pp. 1-17.

[53]

X. Li, Y.-S. Zhang, R.-Z. Lou, C. Wu, J.-Q. Wang, Chain-of-scrutiny: detecting backdoor attacks for large language models [Online]. Available, https://arxiv.org/abs/2406.05948, June 2024.

[54]

H.-R. Li, Y.-L. Chen, Z.-H. Zheng, et al., Simulate and eliminate: revoke backdoors for generative large language models [Online]. Available, https://arxiv.org/abs/2405.07667, May 2024.

[55]

Y.-T. Li, Z.-C. Xu, F.-Q. Jiang, et al., CleanGen: mitigating backdoor attacks for generation tasks in large language models, in: Proc. of the Conf. on Empirical Methods in Natural Language Processing, Miami, USA, (2024), pp. 9101-9118.

[56]

X.-L. Xu, K.-Y. Kong, N. Liu, et al., An LLM can fool itself: a prompt-based adversarial attack, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-23.

[57]

A. Kumar, C. Agarwal, S. Srinivas, A.J. Li, S. Feizi, H. Lakkaraju, Certifying LLM safety against adversarial prompting [Online]. Available, https://arxiv.org/abs/2309.02705, September 2023.

[58]

H. Brown, L. Lin, K. Kawaguchi, M. Shieh, Self-evaluation as a defense against adversarial attacks on LLMs [Online]. Available, https://arxiv.org/abs/2407.03234, July 2024.

[59]

Y.-K. Zhao, L.-Y. Yan, W.-W. Sun, et al., Improving the robustness of large language models via consistency alignment, in: Proc. of the Joint Intl. Conf. on Computational Linguistics, Language Resources and Evaluation, Torino, Italia, (2024), pp. 8931-8941.

[60]

S. Kadavath, T. Conerly, A. Askell, et al., Language models (mostly) know what they know [Online]. Available, https://arxiv.org/abs/2207.05221, July 2022.

[61]

O. Cartwright, H. Dunbar, T. Radcliffe, Evaluating privacy compliance in commercial large language models-ChatGPT, Claude, and Gemini, Research Square, (2024),10.21203/rs.3.rs-4792047/v1.

[62]

H. Nissenbaum, Privacy as contextual integrity, Wash. Law Rev. 79 (2004) 119.

[63]

N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramèr, C.-Y. Zhang, Quantifying memorization across neural language models, in: Proc. of the 11th Intl. Conf. on Learning Representations, Kigali, Rwanda, (2023), pp. 1-19.

[64]

R. Eldan, M. Russinovich, Who’s Harry Potter? Approximate unlearning in LLMs [Online]. Available, https://arxiv.org/abs/2310.02238, October 2023.

[65]

V. Patil, P. Hase, M. Bansal, Can sensitive information be deleted from LLMs? Objectives for defending against extraction attacks [Online]. Available, https://arxiv.org/abs/2309.17410, September 2023.

[66]

N. Kandpal, E. Wallace, C. Raffel, Deduplicating training data mitigates privacy risks in language models, in: Proc. of the 39th Intl. Conf. on Machine Learning, Baltimore, USA, (2022), pp. 10697-10707.

[67]

R. Shokri, M. Stronati, C.-Z. Song, V. Shmatikov, Membership inference attacks against machine learning models, in: Proc. of the IEEE Symposium on Security and Privacy, San Jose, USA, (2017), pp. 3-18.

[68]

P. Maini, H.-R. Jia, N. Papernot, A. Dziedzic, LLM dataset inference: did you train on my dataset? [Online]. Available, https://arxiv.org/abs/2406.06443, June 2024.

[69]

J. Mattern, F. Mireshghallah, Z.-J. Jin, B. Schoelkopf, M. Sachan, T. Berg-Kirkpatrick, Membership inference attacks against language models via neighbourhood comparison, in: Proc. of the Findings of the Association for Computational Linguistics, Toronto, Canada, (2023), pp. 11330-11343.

[70]

W.-J. Shi, A. Ajith, M.-Z. Xia, et al., Detecting pretraining data from large language models, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-17.

[71]

M. Kaneko, Y.-M. Ma, Y. Wata, N. Okazaki, Sampling-based pseudo-likelihood for membership inference attacks [Online]. Available, https://arxiv.org/abs/2404.11262, April 2024.

[72]

N. Kandpal, K. Pillutla, A. Oprea, P. Kairouz, C.A. Choquette-Choo, Z. Xu, User inference attacks on large language models, in: Proc. of the Conf. on Empirical Methods in Natural Language Processing, Miami, USA, (2024), pp. 18238-18265.

[73]

Z.-P. Wang, A.-D. Cheng, Y.-G. Wang, L. Wang, Information leakage from embedding in large language models [Online]. Available, https://arxiv.org/abs/2405.11916, May 2024.

[74]

R.-S. Zhang, S. Hidano, F. Koushanfar, Text revealer: private text reconstruction via model inversion attacks against transformers [Online]. Available, https://arxiv.org/abs/2209.10505, September 2022.

[75]

J.X. Morris, W.-T. Zhao, J.T. Chiu, V. Shmatikov, A.M. Rush, Language model inversion, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-21.

[76]

L. Hu, A.-L. Yan, H.-Y. Yan, et al., Defenses to membership inference attacks: a survey, ACM Comput. Surv. 56 (4) (2024) 92.

[77]

M.-X. Du, X. Yue, S.S.M. Chow, T.-H. Wang, C.-Y. Huang, H. Sun, DP-Forward: fine-tuning and inference on language models with differential privacy in forward pass, in: Proc. of the ACM SIGSAC Conf. on Computer and Communications Security, Copenhagen, Denmark, (2023), pp. 2665-2679.

[78]

D.-F. Chen, N. Yu, M. Fritz, RelaxLoss: defending membership inference attacks without losing utility, in: Proc. of the 10th Intl. Conf. on Learning Representations, Virtual Event, (2022), pp. 1-28.

[79]

Z.-J. Li, C.-Z. Wang, P.-C. Ma, et al., On extracting specialized code abilities from large language models: a feasibility study, in: Proc. of the IEEE/ACM 46th Intl. Conf. on Software Engineering, Lisbon, Portugal, (2024), pp. 1-13.

[80]

N. Carlini, D. Paleka, K.D. Dvijotham, et al., Stealing part of a production language model, in: Proc. of the 41st Intl. Conf. on Machine Learning, Vienna, Austria, (2024), pp. 1-26.

[81]

M. Finlayson, X. Ren, S. Swayamdipta, Logits of API-protected LLMs leak proprietary information [Online]. Available, https://arxiv.org/abs/2403.09539, March 2024.

[82]

Y. Bai, G. Pei, J.-D. Gu, Y. Yang, X.-J. Ma, Special characters attack: toward scalable training data extraction from large language models [Online]. Available, https://arxiv.org/abs/2405.05990, May 2024.

[83]

C.-L. Zhang, J.X. Morris, V. Shmatikov, Extracting prompts by inverting LLM outputs, in: Proc. of the Conf. on Empirical Methods in Natural Language Processing, Miami, USA, (2024), pp. 14753-14777.

[84]

Q.-F. Li, Z.-Q. Shen, Z.-H. Qin, et al., TransLinkGuard: safeguarding Transformer models against model stealing in edge deployment, in: Proc. of the 32nd ACM Intl. Conf. on Multimedia, Melbourne, Australia, (2024), pp. 3479-3488.

[85]

Z. Charles, A. Ganesh, R. McKenna, et al., Fine-tuning large language models with user-level differential privacy, in: Proc. of the ICML2024 Workshop on Theoretical Foundations of Foundation Models, Vienna, Austria, (2024), pp. 1-24.

[86]

L. Chua, B. Ghazi, S.-B. Huang, et al., Mind the privacy unit! User-level differential privacy for language model fine-tuning [Online]. Available, https://arxiv.org/abs/2406.14322, June 2024.

[87]

X.-Y. Tang, R. Shin, H.A. Inan, et al., Privacy-preserving in-context learning with differentially private few-shot generation [Online]. Available, https://arxiv.org/abs/2309.11765, September 2023.

[88]

J.-Y. Zheng, H.-N. Zhang, L.-X. Wang, W.-J. Qiu, H.-W. Zheng, Z.-M. Zheng, Safely learning with private data: a federated learning framework for large language model, in: Proc. of the Conf. on Empirical Methods in Natural Language Processing, Miami, USA, (2024), pp. 5293-5306.

[89]

Y.-B. Sun, Z.-T. Li, Y.-L. Li, B.-L. Ding, Improving LoRA in privacy-preserving federated learning, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-17.

[90]

M. Hao, H.-W. Li, H.-X. Chen, P.-Z. Xing, G.-W. Xu, T.-W. Zhang, Iron: private inference on Transformers, in: Proc. of the 36th Intl. Conf. on Neural Information Processing Systems, New Orleans, USA, (2022), pp. 1-14.

[91]

X.-Y. Hou, J. Liu, J.-Y. Li, et al., CipherGPT: secure two-party GPT inference, Cryptology ePrint Archive [Online]. Available, https://eprint.iacr.org/2023/1147, May 2023.

[92]

Y. Dong, W.-J. Lu, Y.-C. Zheng, et al., PUMA: secure inference of LLaMA-7B in five minutes [Online]. Available, https://arxiv.org/abs/2307.12533, July 2023.

[93]

H.-C. Sun, J. Li, H.-Y. Zhang, zkLLM: zero knowledge proofs for large language models, in: Proc. of the on ACM SIGSAC Conf. on Computer and Communications Security, Salt Lake City, USA, (2024), pp. 4405-4419.

[94]

J. Kirchenbauer, J. Geiping, Y.-X. Wen, J. Katz, I. Miers, T. Goldstein, A watermark for large language models, in: Proc. of the 40th Intl. Conf. on Machine Learning, Honolulu, USA, (2023), pp. 17061-17084.

[95]

H.-W. Yao, J. Lou, Z. Qin, K. Ren, PromptCARE: prompt copyright protection by watermark injection and verification, in: Proc. of the IEEE Symposium on Security and Privacy, San Francisco, USA, (2024), pp. 845-861.

[96]

X.-D. Zhao, P.V. Ananth, L. Li, Y.-X. Wang, Provable robust watermarking for AI-generated text, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-35.

[97]

S. Min, S. Gururangan, E. Wallace, et al., SILO language models: isolating legal risk in a nonparametric datastore, in: Proc. of the 12th Intl. Conf. on Learning Representations, Vienna, Austria, (2024), pp. 1-27.

PDF (1019KB)

896

Accesses

0

Citation

Detail

Sections
Recommended

/