The ethical security of large language models: A systematic review

Feng LIU , Jiaqi JIANG , Yating LU , Zhanyi HUANG , Jiuming JIANG

Front. Eng ›› 2025, Vol. 12 ›› Issue (1) : 128 -140.

PDF (1224KB)
Front. Eng ›› 2025, Vol. 12 ›› Issue (1) : 128 -140. DOI: 10.1007/s42524-025-4082-6
Information Management and Information Systems
REVIEW ARTICLE

The ethical security of large language models: A systematic review

Author information +
History +
PDF (1224KB)

Abstract

The widespread application of large language models (LLMs) has highlighted new security challenges and ethical concerns, attracting significant academic and societal attention. Analysis of the security vulnerabilities of LLMs and their misuse in cybercrime reveals that their advanced text-generation capabilities pose serious threats to personal privacy, data security, and information integrity. In addition, the effectiveness of current LLM-based defense strategies has been reviewed and evaluated. This paper examines the social implications of LLMs and proposes future directions for enhancing their security applications and ethical governance, aiming to inform the development of the field.

Graphical abstract

Keywords

security of large language models / ethical governance / model defense / adversarial training / social impact

Cite this article

Download citation ▾
Feng LIU, Jiaqi JIANG, Yating LU, Zhanyi HUANG, Jiuming JIANG. The ethical security of large language models: A systematic review. Front. Eng, 2025, 12(1): 128-140 DOI:10.1007/s42524-025-4082-6

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Agathokleous E, Saitanis C J, Fang C, Yu Z, (2023). Use of ChatGPT: What does it mean for biology and environmental science. Science of the Total Environment, 888: 164154

[2]

BegouNVinoy JDudaAKorczyńskiM (2023). Exploring the dark side of AI: Advanced phishing attack design and deployment using ChatGPT. In: 2023 IEEE Conference on Communications and Network Security (CNS) October. Orlando, FL, USA: IEEE, 1–6

[3]

BhardwajRPoria S (2023). Red-teaming large language models using chain of utterances for safety-alignment. ArXiv, abs/2308.09662

[4]

Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D, (2020). Language models are few-shot learners. In: 34th Conference on Neural Information Processing Systems December. Vancouver, Canada: Neural Information Processing Systems, 33: 1877–1901

[5]

Cao B, Cao Y, Lin L, Chen J, (2024). Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics August. Bangkok: Association for Computational Linguistics, 1: 10542–10560

[6]

CascellaMMontomoli JBelliniVBignamiE (2023). Evaluating the Feasibility of ChatGPT in Healthcare: An analysis of multiple clinical and research scenarios. Journal of Medical Systems, 47(1)

[7]

ChenBPaliwal AYanQ (2023b). Jailbreaker in jail: Moving target defense for large language models. In: CCS 2023: ACM SIGSAC Conference on Computer and Communications Security November. Copenhagen Denmark: Association for Computing Machinery: 29–32

[8]

ChenCWu ZLaiYOuWLiaoT ZhengZ (2023a). Challenges and remedies to privacy and security in AIGC: Exploring the potential of privacy computing, blockchain, and beyond. ArXiv, abs/2306.00419

[9]

Chu C C, Wei P L, (2023). Determination of criminal liability of developers in generative artificial intelligence crimes——Taking ChatGPT as an example. Journal of Chongqing University of Technology, 37( 9): 103–113 (in Chinese)

[10]

CorselloASantangelo A (2023). May artificial intelligence influence future pediatric research the case of ChatGPT. Children, 10(4): 757

[11]

De Angelis L, Baglivo F, Arzilli G, Privitera G P, Ferragina P, Tozzi A E, Rizzo C, (2023). ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Frontiers in Public Health, 11: 1166120

[12]

DengBWang WFengFDengYWangQ HeX (2023). Attack prompt generation for red teaming and defending large language models. ArXiv, abs/2310.12505

[13]

DengGLiu YLiYWangKZhangY LiZWangH ZhangTLiu Y (2024a). MASTERKEY: Automated jailbreaking of large language model chatbots. In: Proceedings of Network and Distributed System Security Symposium February. San Diego: Internet Society

[14]

DengYZhang WPanS JBingL (2024b). Multilingual jailbreak challenges in large language models. In: Proceedings of the 12th International Conference on Learning Representations May. Singapore: International Conference on Learning Representations, ICLR

[15]

DernerEBatistič KZahálkaJBabuškaR (2024). A security risk taxonomy for prompt-based interaction with large language models. IEEE Access 12: 126176–126187

[16]

Devlin J, Chang M W, Lee K, Toutanova K, (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Jane. Minneapolis, Minnesota: Association for Computational Linguistics, 1: 4171–4186

[17]

Ding P, Kuang J, Ma D, Cao X, Xian Y, Chen J, Huang S, (2024). A Wolf in Sheep’s clothing: Generalized nested jailbreak prompts can fool large language models easily. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics : Human Language Technologies (NAACL) June. Mexico City: Association for Computational Linguistics, 1: 2136–2153

[18]

Ellis M E, Casey K M, Hill G, (2024). ChatGPT and python programming homework. Decision Sciences Journal of Innovative Education, 22( 2): 74–87

[19]

Elsadig M A, (2023). ChatGPT and cybersecurity: Risk knocking the door. Journal of Internet Services and Information Security, 14( 1): 01–15

[20]

FanB (2023). Risk identification and governance strategies for ChatGPT. Academia Bimestris, (2): 58–63 (in Chinese)

[21]

FangRBindu RGuptaA, . (2024). LLM Agents Can Autonomously Exploit One-day Vulnerabilities. arXiv preprint arXiv:2404.08144

[22]

Ferrara E, (2023). Should ChatGPT be biased? Challenges and risks of bias in large language models. First Monday, 28( 11): 1–1

[23]

Ge S, Zhou C, Hou R, Khabsa M, Wang YC, Wang Q, Han J, Mao Y, (2024). MART: Improving LLM safety with multi-round automatic red-teaming. In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 June. Mexico City: Association for Computational Linguistics (ACL), 1: 1927–1937

[24]

Gupta M, Akiri C, Aryal K, Parker E, Praharaj L, (2023). From ChatGPT to ThreatGPT: Impact of generative AI in cybersecurity and privacy. IEEE Access: Practical Innovations, Open Solutions, 11: 80218–80245

[25]

HasanARugina IWangA (2024). Pruning for protection: Increasing jailbreak resistance in Aligned LLMs without fine-tuning. In: Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP November. Miami, 417–430

[26]

HuangHZhao ZBackesMShenYZhangY (2024). Composite backdoor attacks against large language models. In: Findings of the Association for Computational Linguistics: NAACL 2024 – Findings June. Mexico City: Association for Computational Linguistics (ACL), 1459–1472

[27]

Iqbal F, Samsom F, Kamoun F, MacDermottÁ (2023). When ChatGPT goes rogue: exploring the potential cybersecurity threats of AI-powered conversational chatbots. Frontiers in Communications and Networks, 4: 1220243

[28]

JainNSchwarzschild AWenYSomepalliGKirchenbauer JChiangPGoldblmMSahaA GeipingJGoldstein T (2023). Baseline defenses for adversarial attacks against aligned language models. ArXiv, abs/2309.00614

[29]

Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, Ishii E, Bang Y J, Madotto A, Fung P, (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55( 12): 1–38

[30]

JiangLZhou HLinYLiPZhouJ JiangR (2022). ROSE: Robust selective fine-tuning for pre-trained language models. ArXiv, abs/2210.09658

[31]

JungherrA (2023). Artificial intelligence and democracy: A conceptual framework. Social Media + Society, 9(3): 20563051231186353

[32]

LiJLiuY LiuCShiL RenXZhengY LiuYXueY (2024). A cross-language investigation into jailbreak attacks in large language models. ArXiv, abs/2401.16765

[33]

Liu B, Xiao B, Jiang X, Cen S, He X, Dou W, (2023a). Adversarial attacks on large language model-based system and mitigating strategies: A case study on ChatGPT. Security and Communication Networks, 2023( 1): 1–1

[34]

Liu F, (2024a). Artificial Intelligence in Emotion Quantification: A Prospective Overview. CAAI Artificial Intelligence Research, 3: 9150040

[35]

Liu F, Wang HY, Shen SY, Jia X, Hu JY, Zhang JH, Wang XY, Lei Y, Zhou AM, Qi JY, Li ZB, (2023b). OPO-FCM: A computational affection based OCC-PAD-OCEAN federation cognitive modeling approach. IEEE Transactions on Computational Social Systems, 10( 4): 1813–1825

[36]

Liu J, Wang C, Liu S, (2023c). Utility of ChatGPT in clinical practice. Journal of Medical Internet Research, 25: e48568

[37]

LiuXXuN ChenMXiao C (2024b). AUTODAN: Generating stealthy jailbreak prompts on aligned large language models. In: Proceedings of the 12th International Conference on Learning Representations May. Vienna: University of Wisconsin-Madison, USC, University of California, Davis

[38]

Liu X Q, (2023d). Research on criminal liability issues of generative artificial intelligence such as ChatGPT. Modern Law Science, 45( 4): 110–125 (in Chinese)

[39]

LiuZYaoZ LiFLuoB (2024c). Check me if you can: Detecting ChatGPT-generated academic writing using CheckGPT. ArXiv, abs/2306.05524

[40]

Liyanage V, Buscaldi D, (2023). Detecting artificially generated academic text: The importance of mimicking human utilization of large language models. In: International Conference on Applications of Natural Language to Information Systems June. Berlin: Springer Nature, 13913: 558–565

[41]

LucasJUchendu AYamashitaMLeeJRohatgiS LeeD (2023). Fighting fire with fire: The dual role of LLMs in crafting and detecting elusive disinformation. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) December. Singapore: Association for Computational Linguistics (ACL), 14279–14305

[42]

MaCYangZ GaoMCiH GaoJPanX YangY (2023). Red teaming game: A game-theoretic framework for red teaming language models. ArXiv, abs/2310.00322

[43]

Májovský M, Cerny M, Kasal M E, Komarc M, Netuka D, (2023). Artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora’s box has been opened. Journal of Medical Internet Research, 25: e46924

[44]

Meyer J G, Urbanowicz R J, Martin P C N, O’Connor K, Li R, Peng P C, Bright T J, Tatonetti N, Won K J, Gonzalez-Hernandez G, Moore J H, (2023). ChatGPT and large language models in academia: opportunities and challenges. BioData Mining, 16( 1): 20

[45]

MoWXuJ LiuQWangJ YanJXiaoC ChenM (2023). Test-time backdoor mitigation for black-box large language models with defensive demonstrations. ArXiv, abs/2311.09763

[46]

Motoki F, Pinho Neto V, Rodrigues V, (2024). More human than human: Measuring ChatGPT political bias. Public Choice, 198( 1-2): 3–23

[47]

MozesMHe XKleinbergBGriffinL D (2023). Use of llms for illicit purposes: Threats, prevention measures, and vulnerabilities. ArXiv, abs/2308.12833

[48]

Niszczota P, Conway P, (2023). Judgements of research co-created by Generative AI: Experimental Evidence. Economics and Business Review, 9( 2): 101–114

[49]

O’NeillMConnorM (2023). Amplifying limitations, harms and risks of large language models. ArXiv, abs/2307.04821

[50]

QammarAWang HDingJNaouriADaneshmand MNingH (2023). Chatbots to chatgpt in a cybersecurity space: Evolution, vulnerabilities, attacks, challenges, and future recommendations. ArXiv, abs/2306.09255

[51]

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J, (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21( 140): 1–67

[52]

Rahman M M, Watanobe Y, (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13( 9): 5783

[53]

Rasul T, Nair S, Kalendra D, Robin M, Santini F O, Ladeira W J, Sun M, Day I, Rather R A, Heathcote L, (2023). The role of ChatGPT in higher education: Benefits, challenges, and future research directions. Journal of Applied Learning and Teaching, 6( 1): 41–56

[54]

RenJXuH LiuYCuiY WangSYin DTangJ (2024). A robust semantics-based watermark for large language model against paraphrasing. Findings of the Association for Computational Linguistics: NAACL 2024 - Findings, 613–625

[55]

RobeyAWong EHassaniHPappasG J (2023). SmoothLLM: Defending large language models against jailbreaking attacks. ArXiv, abs/2310.03684

[56]

RozadoD (2023). The political Biases of ChatGPT. Social Sciences, 12(3)

[57]

Saetra H S, (2023). Generative AI: Here to stay, but for good. Technology in Society, 75: 102372

[58]

SalemAPaverd AKöpfB (2023). Maatphor: Automated variant analysis for prompt injection attacks. ArXiv, abs/2312.11513

[59]

Sang J, Yu J, (2023). ChatGPT: A Glimpse into AI’s future. Journal of Computer Research and Development, 60( 6): 1191–1201 (in Chinese)

[60]

Sison A J G, Daza M T, Gozalo-Brizuela R, Garrido-Merchán E C, (2024). ChatGPT: More than a “weapon of mass deception” ethical challenges and responses from the human-centered artificial intelligence (HCAI) perspective. International Journal of Human-Computer Interaction, 40( 17): 4853–4872

[61]

Song X Q, Liu M J, Chen J H, (2023). Comprehensive impact analysis of GPT-4: High-quality economic development and national security prevention. Journal of Guangdong University of Finance & Economics, 38( 02): 100–112 (in Chinese)

[62]

StaabRVero MBalunovićMVechevM (2024). Beyond memorization: Violating privacy via inference with large language models. In: Proceedings of the 12th International Conference on Learning Representations, ICLR 2024, December. Vienna: International Conference on Learning Representations, ICLR

[63]

SuY (2024). The legal risks and governance paths for large language models. Journal of Northwest University of Political Science and Law, 42(1): 1–13 (in Chinese)

[64]

SuoX (2024). Signed-prompt: A new approach to prevent prompt injection attacks against LLM-Integrated applications. In: Proceedings of the 2024 2nd International Conference on Computer Science and Mechatronics (ICCSM 2024), January

[65]

Vykopal I, Pikuliak M, Srba I, Moro R, Macko D, Bielikova M, (2024). Disinformation capabilities of large language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics August. Bangkok, Thailand: Association for Computational Linguistics, 1: 14830–14847

[66]

WenJKeP SunHZhangZ LiCBaiJ HuangM (2023). Unveiling the implicit toxicity in large language models. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing December. Singapore: Association for Computational Linguistics (ACL) 1322–1338

[67]

Xiao F, Lai N, (2024). The Subjectivity Problem associated with generative artificial intelligence. Journal of Shanxi Normal University, 51( 01): 13–20

[68]

YangXJPan LMZhaoXDChenHFPetzoldL WangWCheng W (2024). A survey on detection of LLMs-generated content. In: Proceedings of Findings of the Association for Computational Linguistics: EMNLP 2024, January. Miami: Association for Computational Linguistics, 9786–9805

[69]

YaoDZhangJ HarrisI GCarlsson M (2024). Fuzzllm: A novel and universal fuzzing framework for proactively discovering jailbreak vulnerabilities in large language models. In: 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 April. Seoul, Republic of Korea: Institute of Electrical and Electronics Engineers Inc., 4485–4489

[70]

Yu S, Fan D Z, (2023). The value challenge of a new generation of Al ChatGPT and its inclusive governance. Journal of Hainan University (Humanities & Social Sciences), 41( 5): 82–90 (in Chinese)

[71]

Yuan Z, (2023). On the capacity for responsibility of generative artificial intelligence. Oriental Law, 1( 3): 18–33 (in Chinese)

[72]

Zack T, Lehman E, Suzgun M, Rodriguez J A, Celi L A, Gichoya J, Jurafsky D, Szolovits P, Bates D W, Abdulnour R E, Butte A J, Alsentzer E, (2024). Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: A model evaluation study. Lancet. Digital Health, 6( 1): e12–e22

[73]

Zhang X H, (2023). Political and social dynamics, risks, and prevention of ChatGPT. Journal of Shenzhen University (Humanities & Social Sciences), 40( 3): 5–12 (in Chinese)

[74]

ZhangYDing LZhangLTaoD (2024). Intention analysis prompting makes large language models a good jailbreak defender. ArXiv, abs/2401.06561

[75]

ZhouXWang QWangXTangHLiuX (2023). Large language model soft ideologization via AI-self-consciousness. ArXiv, abs/2309.16167

[76]

ZhuoT YHuang YChenCXingZ (2023). Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity. ArXiv, abs/2301.12867

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (1224KB)

1869

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/