Software defect detection using large language models: a literature review

Yu CHEN , Yi SHEN , Taiyan WANG , Shiwen OU , Ruipeng WANG , Yuwei LI , Zulie PAN

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (6) : 2006202

PDF (2168KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (6) : 2006202 DOI: 10.1007/s11704-025-40672-2
Software
REVIEW ARTICLE

Software defect detection using large language models: a literature review

Author information +
History +
PDF (2168KB)

Abstract

As software systems grow in complexity, the importance of efficient defect detection escalates, becoming vital to maintain software quality. In recent years, artificial intelligence technology has boomed. In particular, with the proposal of Large Language Models (LLMs), researchers have found the huge potential of LLMs to enhance the performance of software defect detection. This review aims to elucidate the relationship between LLMs and software defect detection. We categorize and summarize existing research based on the distinct applications of LLMs in dynamic and static detection scenarios. Dynamic detection methods are categorized based on the different phases in which they employ LLMs, such as using them for test case generation, providing feedback guidance, and conducting output assessment. Static detection methods are classified according to whether they analyze the source code or the binary of the software under test. Furthermore, we investigate the prompt engineering and model fine-tuning strategies adopted within these studies. Finally, we summarize the emerging trend of integrating LLMs into software defect detection, identify challenges to be addressed and prospect for some potential research directions.

Graphical abstract

Keywords

software defect detection / large language models / prompt engineering / fine-tuning

Cite this article

Download citation ▾
Yu CHEN, Yi SHEN, Taiyan WANG, Shiwen OU, Ruipeng WANG, Yuwei LI, Zulie PAN. Software defect detection using large language models: a literature review. Front. Comput. Sci., 2026, 20(6): 2006202 DOI:10.1007/s11704-025-40672-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Hou X, Zhao Y, Liu Y, Yang Z, Wang K, Li L, Luo X, Lo D, Grundy J, Wang H . Large language models for software engineering: a systematic literature review. ACM Transactions on Software Engineering and Methodology, 2024, 33( 8): 220

[2]

Wang J, Huang Y, Chen C, Liu Z, Wang S, Wang Q . Software testing with large language models: survey, landscape, and vision. IEEE Transactions on Software Engineering, 2024, 50( 4): 911–936

[3]

Zhang J, Bu H, Wen H, Liu Y, Fei H, Xi R, Li L, Yang Y, Zhu H, Meng D . When LLMs meet cybersecurity: a systematic literature review. Cybersecurity, 2025, 8( 1): 55

[4]

Zhou X, Cao S, Sun X, Lo D. Large language model for vulnerability detection and repair: literature review and the road ahead. 2024, arXiv preprint arXiv: 2404.02525

[5]

Zhao W X, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z, Du Y, Yang C, Chen Y, Chen Z, Jiang J, Ren R, Li Y, Tang X, Liu Z, Liu Y, Nie J-Y, Wen J-R. A survey of large language models. 2023, arXiv preprint arXiv: 2303.18223

[6]

Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020, 7871−7880

[7]

Deng Y, Xia C S, Peng H, Yang C, Zhang L. Large language models are zero-shot fuzzers: fuzzing deep-learning libraries via large language models. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023, 423−435

[8]

Decrop A, Perrouin G, Papadakis M, Devroey X, Schobbens P-Y. You can rest now: automated specification inference and black-box testing of RESTful APIs with large language models. 2024, arXiv preprint arXiv: 2402.05102

[9]

Asmita, Oliinyk Y, Scott M, Tsang R, Fang C, Homayoun H. Fuzzing BusyBox: leveraging LLM and crash reuse for embedded bug unearthing. In: Proceedings of the 33rd USENIX Security Symposium. 2024

[10]

Liu Z, Chen C, Wang J, Che X, Huang Y, Hu J, Wang Q. Fill in the blank: context-aware automated text input generation for mobile GUI testing. In: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering. 2023, 1355−1367

[11]

Li T, Cui C, Ma L, Towey D, Xie Y, Huang R. Leveraging large language models for automated web-form-test generation: an empirical study. 2024, arXiv preprint arXiv: 2405.09965

[12]

Cui C, Li T, Wang J, Chen C, Towey D, Huang R. Large language models for mobile GUI text input generation: an empirical study. 2024, arXiv preprint arXiv: 2404.08948

[13]

Ran D, Wang H, Song Z, Wu M, Cao Y, Zhang Y, Yang W, Xie T. Guardian: a runtime framework for LLM-based UI exploration. In: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2024, 958−970

[14]

Meng R, Mirchev M, Böhme M, Roychoudhury A. Large language model guided protocol fuzzing. In: Proceedings of the 31st Annual Network and Distributed System Security Symposium. 2024, 1−17

[15]

Hu J, Zhang Q, Yin H. Augmenting greybox fuzzing with generative AI. 2023, arXiv preprint arXiv: 2306.06782

[16]

Xia C, Paltenghi M, Tian J L, Pradel M, Zhang L. Fuzz4all: universal fuzzing with large language models. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 2024, 126

[17]

Schaäfer M, Nadi S, Eghbali A, Tip F . An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering, 2024, 50( 1): 85–105

[18]

Yan L, Zhang Z, Tao G, Zhang K, Chen X, Shen G, Zhang X. ParaFuzz: an interpretability-driven technique for detecting poisoned samples in NLP. 2023, arXiv preprint arXiv: 2308.02122

[19]

Nashid N, Sintaha M, Mesbah A. Retrieval-based prompt selection for code-related few-shot learning. In: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). 2023, 2450−2462

[20]

Qiu F, Ji P, Hua B, Wang Y. CHEMFUZZ: large language models-assisted fuzzing for quantum chemistry software bug detection. In: Proceedings of the 23rd IEEE International Conference on Software Quality, Reliability, and Security Companion (QRS-C). 2023, 103−112

[21]

Eom J, Jeong S, Kwon T. Fuzzing JavaScript interpreters with coverage-guided reinforcement learning for LLM-based mutation. In: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2024, 1656−1668

[22]

Kim M, Stennett T, Shah D, Sinha S, Orso A. Leveraging large language models to improve rest API testing. In: Proceedings of the 44th ACM/IEEE International Conference on Software Engineering: New Ideas and Emerging Results. 2024, 37−41

[23]

Ma X, Luo L, Zeng Q. From one thousand pages of specification to unveiling hidden bugs: large language model assisted fuzzing of matter IoT devices. In: Proceedings of the 33rd USENIX Conference on Security Symposium. 2024, 268

[24]

Wang D, Zhou G, Chen L, Li D, Miao Y. ProphetFuzz: fully automated prediction and fuzzing of high-risk option combinations with only documentation via large language model. In: Proceedings of 2024 on ACM SIGSAC Conference on Computer and Communications Security. 2024, 735−749

[25]

Wang J, Yu L, Luo X. LLMIF: augmented large language model for fuzzing IoT devices. In: Proceedings of 2024 IEEE Symposium on Security and Privacy (SP). 2024, 881−896

[26]

Ackerman J, Cybenko G. Large language models for fuzzing parsers (registered report). In: Proceedings of the 2nd International Fuzzing Workshop. 2023, 31−38

[27]

Feng S, Chen C. Prompting is all you need: automated android bug replay with large language models. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 2024, 67

[28]

Kang S, Yoon J, Yoo S. Large language models are few-shot testers: exploring LLM-based general bug reproduction. In: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). 2023, 2312−2323

[29]

Huang Y, Wang J, Liu Z, Wang Y, Wang S, Chen C, Hu Y, Wang Q. CrashTranslator: automatically reproducing mobile application crashes directly from stack trace. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 2024, 18

[30]

Zhang Y, Song W, Ji Z, Yao D, Meng N. How well does LLM generate security tests? 2023, arXiv preprint arXiv: 2310.00710

[31]

Deng Y, Xia C S, Yang C, Zhang S D, Yang S, Zhang L. Large language models are edge-case generators: crafting unusual programs for fuzzing deep learning libraries. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 2024, 70

[32]

Liu Z, Chen C, Wang J, Chen M, Wu B, Tian Z, Huang Y, Hu J, Wang Q. Testing the limits: unusual text inputs generation for mobile app crash detection with large language model. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 2024, 137

[33]

Yu J, Lin X, Yu Z, Xing X. GPTFUZZER: red teaming large language models with auto-generated jailbreak prompts. 2023, arXiv preprint arXiv: 2309.10253

[34]

Guan H, Bai G, Liu Y. Large language models can connect the dots: exploring model optimization bugs with domain knowledge-aware prompts. In: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2024, 1579−1591

[35]

Yang C, Deng Y, Lu R, Yao J, Liu J, Jabbarvand R, Zhang L . WhiteFox: white-box compiler fuzzing empowered by large language models. Proceedings of the ACM on Programming Languages, 2024, 8( OOPSLA2): 296

[36]

Wei H, Du Z, Huang H, Liu Y, Cheng G, Wang L, Mao B. Inferring state machine from the protocol implementation via large language model. 2024, arXiv preprint arXiv: 2405.00393

[37]

Lyu Y, Xie Y, Chen P, Chen H. Prompt fuzzing for fuzz driver generation. In: Proceedings of 2024 on ACM SIGSAC Conference on Computer and Communications Security. 2024, 3793−3807

[38]

Kande R, Pearce H, Tan B, Dolan-Gavitt B, Thakur S, Karri R, Rajendran J . (Security) assertions by large language models. IEEE Transactions on Information Forensics and Security, 2024, 19: 4374–4389

[39]

Zhou Z, Yang Y, Wu S, Huang Y, Chen B, Peng X. Magneto: a step-wise approach to exploit vulnerabilities in dependent libraries via LLM-empowered directed fuzzing. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024, 1633−1644

[40]

Wang Z, Liu K, Li G, Jin Z. HITS: high-coverage LLM-based unit test generation via method slicing. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024, 1258−1268

[41]

Zhang C, Zheng Y, Bai M, Li Y, Ma W, Xie X, Li Y, Sun L, Liu Y. How effective are they? exploring large language model based fuzz driver generation. In: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2024, 1223−1235

[42]

Yang M, Chen Y, Liu Y, Shi L. DistillSeq: a framework for safety alignment testing in large language models using knowledge distillation. In: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2024, 578−589

[43]

Deng G, Liu Y, Li Y, Wang K, Zhang Y, Li Z, Wang H, Zhang T, Liu Y. MASTERKEY: automated jailbreaking of large language model chatbots. In: Proceedings of the 31st Annual Network and Distributed System Security Symposium. 2024, 1−16

[44]

Liu Z, Chen C, Wang J, Chen M, Wu B, Che X, Wang D, Wang Q. Make LLM a testing expert: bringing human-like interaction to mobile GUI testing via functionality-aware decisions. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 2024, 100

[45]

Ramesh G, Dou Y, Xu W. GPT-4 jailbreaks itself with near-perfect success using self-explanation. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 22139−22148

[46]

Lemieux C, Inala J P, Lahiri S K, Sen S. CodaMosa: escaping coverage plateaus in test generation with pre-trained large language models. In: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). 2023, 919−931

[47]

Sakaoğlu S. KARTAL: web application vulnerability hunting using large language models. Aalto University, Dissertation, 2023

[48]

Bakhshandeh A, Keramatfar A, Norouzi A, Chekidehkhoun M M. Using ChatGPT as a static application security testing tool. 2023, arXiv preprint arXiv: 2308.14434

[49]

Mathews N S, Brus Y, Aafer Y, Nagappan M, McIntosh S. LLbezpeky: leveraging large language models for vulnerability detection. 2024, arXiv preprint arXiv: 2401.01269

[50]

Ullah S, Coskun A, Morari A, Pujar S, Stringhini G. Step-by-step vulnerability detection using large language models. In: Proceedings of the 32nd USENIX Security Symposium. 2023

[51]

Hu S, Huang T, İlhan F, Tekin S F, Liu L. Large language model-powered smart contract vulnerability detection: new perspectives. In: Proceedings of the 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). 2023, 297−306

[52]

Ahmad B, Tan B, Karri R, Pearce H. FLAG: finding line anomalies (in code) with generative AI. 2023, arXiv preprint arXiv: 2306.12643

[53]

Yu J, Liang P, Fu Y, Tahir A, Shahin M, Wang C, Cai Y. An insight into security code review with LLMs: capabilities, obstacles and influential factors. 2024, arXiv preprint arXiv: 2401.16310

[54]

Wu F, Zhang Q, Bajaj A P, Bao T, Zhang N, Wang R, Xiao C, others. Exploring the limits of ChatGPT in software security applications. 2023, arXiv preprint arXiv: 2312.05275

[55]

Zhou X, Zhang T, Lo D. Large language model for vulnerability detection: emerging results and future directions. In: Proceedings of the 44th ACM/IEEE International Conference on Software Engineering: New Ideas and Emerging Results. 2024, 47−51

[56]

Khare A, Dutta S, Li Z, Solko-Breslin A, Alur R, Naik M. Understanding the effectiveness of large language models in detecting security vulnerabilities. 2023, arXiv preprint arXiv: 2311.16169

[57]

Ullah S, Han M, Pujar S, Pearce H, Coskun A K, Stringhini G. LLMs cannot reliably identify and reason about security vulnerabilities (yet?): a comprehensive evaluation, framework, and benchmarks. In: Proceedings of 2024 IEEE Symposium on Security and Privacy. 2024, 862–880

[58]

Ding Y, Fu Y, Ibrahim O, Sitawarin C, Chen X, Alomair B, Wagner D A, Ray B, Chen Y. Vulnerability detection with code language models: how far are we? 2024, arXiv preprint arXiv: 2403.18624

[59]

Chen Y, Ding Z, Alowain L, Chen X, Wagner D. DiverseVul: a new vulnerable source code dataset for deep learning based vulnerability detection. In: Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses. 2023, 654−668

[60]

Gao Z, Wang H, Zhou Y, Zhu W, Zhang C. How far have we gone in vulnerability detection using large language models. 2023, arXiv preprint arXiv: 2311.12420

[61]

Cheshkov A, Zadorozhny P, Levichev R. Evaluation of ChatGPT model for vulnerability detection. 2023, arXiv preprint arXiv: 2304.07232

[62]

Fang C, Miao N, Srivastav S, Liu J, Zhang R, Fang R, Asmita, Tsang R, Nazari N, Wang H, Homayoun H. Large language models for code analysis: do LLMs really do their job? In: Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24). 2024, 829−846

[63]

Li H, Hao Y, Zhai Y, Qian Z . Enhancing static analysis for practical bug detection: an LLM-integrated approach. Proceedings of the ACM on Programming Languages, 2024, 8( OOPSLA1): 111

[64]

Sun Y, Wu D, Xue Y, Liu H, Wang H, Xu Z, Xie X, Liu Y. GPTScan: detecting logic vulnerabilities in smart contracts by combining GPT with program analysis. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 2024, 166

[65]

Kharkar A, Moghaddam R Z, Jin M, Liu X, Shi X, Clement C, Sundaresan N. Learning to reduce false positives in analytic bug detectors. In: Proceedings of the 44th IEEE/ACM International Conference on Software Engineering. 2022, 1307−1316

[66]

Chen H, Zhang Y, Han X, Rong H, Zhang Y, Mao T, Zhang H, Wang X, Xing L, Chen X. WitheredLeaf: finding entity-inconsistency bugs with LLMs. 2024, arXiv preprint arXiv: 2405.01668

[67]

Sun Y, Wu D, Xue Y, Liu H, Ma W, Zhang L, Shi M, Liu Y. LLM4Vuln: a unified evaluation framework for decoupling and enhancing LLMs’ vulnerability reasoning. 2024, arXiv preprint arXiv: 2401.16185

[68]

Wu Y, Xie X, Peng C, Liu D, Wu H, Fan M, Liu T, Wang H. AdvSCanner: generating adversarial smart contracts to exploit reentrancy vulnerabilities using LLM and static analysis. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024, 1019−1031

[69]

Shestov A, Cheshkov A, Levichev R, Mussabayev R, Zadorozhny P, Maslov E, Vadim C, Bulychev E. Finetuning large language models for vulnerability detection. 2024, arXiv preprint arXiv: 2401.17010

[70]

Chen T, Li L, Zhu L, Li Z, Liang G, Li D, Wang Q, Xie T. VulLibGen: generating names of vulnerability-affected packages via a large language model. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024, 9767−9780

[71]

Cao D, Liao Y, Shang X. RealVul: can we detect vulnerabilities in web applications with LLM? In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 2024, 8268−8282

[72]

Liu P, Sun C, Zheng Y, Feng X, Qin C, Wang Y, Xu Z, Li Z, Di P, Jiang Y, Sun L. Harnessing the power of LLM to support binary taint analysis. 2023, arXiv preprint arXiv: 2310.08275

[73]

Hu P, Liang R, Chen K. DeGPT: optimizing decompiler output with LLM. In: Proceedings of the 31st Annual Network and Distributed System Security Symposium. 2024, 1−16

[74]

Jiang L, An J, Huang H, Tang Q, Nie S, Wu S, Zhang Y. BinaryAI: binary software composition analysis via intelligent binary source code matching. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 2024, 224

[75]

Jiang N, Wang C, Liu K, Xu X, Tan L, Zhang X, Babkin P. Nova: generative language models for assembly code with hierarchical attention and contrastive learning. 2024, arXiv preprint arXiv: 2311.13721

[76]

She X, Zhao Y, Wang H. WaDec: decompiling WebAssembly using large language model. In: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 2024, 481−492

[77]

Xie D, Zhang Z, Jiang N, Xu X, Tan L, Zhang X. ReSym: harnessing LLMs to recover variable and data structure symbols from stripped binaries. In: Proceedings of 2024 ACM SIGSAC Conference on Computer and Communications Security. 2024, 4554−4568

[78]

Tan H, Luo Q, Li J, Zhang Y. LLM4Decompile: decompiling binary code with large language models. In: Proceedings of 2024 Conference on Empirical Methods in Natural Language Processing. 3473–3487

[79]

Bhandari G, Naseer A, Moonen L. CVEfixes: automated collection of vulnerabilities and their fixes from open-source software. In: Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering. 2021, 30−39

[80]

Siddiq M L, Santos J C S. SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In: Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security. 2022, 29−33

[81]

Mitra J, Ranganath V P. Ghera: a repository of android app vulnerability benchmarks. In: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering. 2017, 43−52

[82]

Nikitopoulos G, Dritsa K, Louridas P, Mitropoulos D. Crossvul: a cross-language vulnerability dataset with commit data. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2021, 1565−1569

[83]

Vaswani A, Shazeer N M, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[84]

Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, , . GPT-4 technical report. 2023, arXiv preprint arXiv: 2303.08774

[85]

Touvron H, Martin L, Stone K, Albert P, Almahairi A, Almahairi. Llama 2: open foundation and fine-tuned chat models. 2023, arXiv preprint arXiv: 2307.09288

[86]

Chen M, Tworek J, Jun H, Yuan Q, de Oliveira Pinto H P, , . Evaluating large language models trained on code. 2021, arXiv preprint arXiv: 2107.03374

[87]

Li R, Allal L B, Zi Y, Muennighoff N, Kocetkov D, , . StarCoder: may the source be with you! 2023, arXiv preprint arXiv: 2305.06161

[88]

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, , . Llama: open and efficient foundation language models. 2023, arXiv preprint arXiv: 2302.13971

[89]

Wei J, Wang X, Schuurmans D, Bosma M, Chi h E H, Xia F, Le Q, Zhou D. Chain of thought prompting elicits reasoning in large language models. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 1−43

[90]

Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih W, Rocktäschel T, Riedel S, Kiela D. Retrieval-augmented generation for knowledge-intensive NLP tasks. 2020, arXiv preprint arXiv: 2005.11401

[91]

Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G . Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 2023, 55: 1–35

[92]

Liu D, Pham V-T, Ernst G, Murray T C, Rubinstein B I P. State selection algorithms and their impact on the performance of stateful network protocol fuzzing. In: Proceedings of 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 2022, 720−730

[93]

Watson C, Tufano M, Moran K, Bavota G, Poshyvanyk D. On learning meaningful assert statements for unit test cases. In: Proceedings of the 42nd IEEE/ACM International Conference on Software Engineering (ICSE). 2020, 1398−1409

[94]

Mastropaolo A, Cooper N, Palacio D N, Scalabrino S, Poshyvanyk D, Oliveto R, Bavota G . Using transfer learning for code-related tasks. IEEE Transactions on Software Engineering, 2023, 49( 4): 1580–1598

[95]

Mastropaolo A, Scalabrino S, Cooper N, Palacio D N, Poshyvanyk D, Oliveto R, Bavota G. Studying the usage of text-to-text transfer transformer to support code-related tasks. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 2021, 336−347

[96]

Fried D, Aghajanyan A, Lin J, Wang S, Wallace E, Shi F, Zhong R, Yih S, Zettlemoyer L, Lewis M. INCODER: a generative model for code infilling and synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023, 1−26

[97]

Kim M, Corradini D, Sinha S, Orso A, Pasqua M, Tzoref-Brill R, Ceccato M. Enhancing REST API testing with NLP techniques. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023, 1232−1243

[98]

brutalsavage. Enquiry about canonical document distillation. See github.com/fuzz4all/fuzz4all/issues/4 website, 2024

[99]

Luo Z, Xu C, Zhao P, Geng X, Tao C, Ma J, Lin Q, Jiang D. Augmented large language models with parametric knowledge guiding. 2023, arXiv preprint arXiv: 2305.04757

[100]

Liu J, Shen D, Zhang Y, Dolan B, Carin L, Chen W. What makes good in-context examples for GPT-3? In: Proceedings of the 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. 2021, 100−114

[101]

Deka B, Huang Z, Franzen C, Hibschman J, Afergan D, Li Y, Nichols J, Kumar R. Rico: a mobile app dataset for building data-driven design applications. In: Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. 2017, 845−854

[102]

Nijkamp E, Pang B, Hayashi H, Tu L, Wang H, Zhou Y, Savarese S, Xiong C. CodeGen: an open large language model for code with multi-turn program synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023, 1−25

[103]

Gehman S, Gururangan S, Sap M, Choi Y, Smith N A. RealToxicityPrompts: evaluating neural toxic degeneration in language models. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020. 2020, 3356−3369

[104]

Dong H, Xiong W, Goyal D, Pan R, Diao S, Zhang J, Shum K, Zhang T. RAFT: reward rAnked finetuning for generative foundation model alignment. 2023, arXiv preprint arXiv: 2304.06767

[105]

Chiang W-L, Li Z, Lin Z, Sheng Y, Wu Z, Zhang H, Zheng L, Zhuang S, Zhuang Y, Gonzalez J E, Stoica I, Xing E P. Vicuna: an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality. See vicuna.lmsys.org website, 2023

[106]

Wang Y, Le H, Gotmare A, Bui N, Li J, Hoi S. CodeT5+: open code large language models for code understanding and generation. In: Proceedings of 2023 Conference on Empirical Methods in Natural Language Processing. 2023, 1069−1088

[107]

OWASP Foundation. OWASP Benchmark Project. See owasp.org/www-project-benchmark website, 2024

[108]

MITRE Corporation. 2024 CWE Top 25 Most Dangerous Software Weaknesses. See cwe.mitre.org/top25/archive/20242024_cwe_top25.html website, 2024

[109]

Developers B. Bandit is a tool designed to find common security issues in python code. See bandit.readthedocs.io/en/latest/ website, 2024

[110]

Semgrep. Semgrep. See semgrep.dev/ website, 2024

[111]

SonarSource SA. Code Quality, Security & Static Analysis Tool with SonarQube | Sonar. See www.sonarsource.com/products/sonarqube website, 2024

[112]

Black P E. SARD: thousands of reference programs for software assurance. See nist.gov/publications/sard-thousands-reference-programs-software-assurance website, 2017

[113]

Pan S, Bao L, Xia X, Lo D, Li S. Fine-grained commit-level vulnerability type prediction by cwe tree structure. In: Proceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). 2023, 957−969

[114]

OWASP-Benchmark. BenchmarkJava. See github.com/OWASP-Benchmark/BenchmarkJava website, 2024

[115]

GitHub, Inc. CodeQL. See codeql.github.com website, 2024

[116]

MITRE Corporation. About CWE. See cwe.mitre.org/about/index.html website, 2024

[117]

Zhai Y, Hao Y, Zhang H, Wang D, Song C, Qian Z, Lesani M, Krishnamurthy S V, Yu P. UBITect: a precise and scalable method to detect use-before-initialization bugs in Linux kernel. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, 221−232

[118]

Facebook, Inc. A tool to detect bugs in Java and C/C++/Objective-C code before it ships. See fbinfer.com website, 2024

[119]

Svyatkovskiy A, Deng S K, Fu S, Sundaresan N. IntelliCode compose: code generation using transformer. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020, 1433−1443

[120]

Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022, 1−13

[121]

Luo Z, Xu C, Zhao P, Sun Q, Geng X, Hu W, Tao C, Ma J, Lin Q, Jiang D. Wizardcoder: empowering code large language models with EVOL-instruct. In: Proceedings of the 12th International Conference on Learning Representations. 2024, 1−21

[122]

Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, others. CodeBERT: a pre-trained model for programming and natural languages. In: Proceedings of the Association for Computational Linguistics: EMNLP 2020. 2020, 1536−1547

[123]

Tencent security keen lab. Design and Implementation of BinaryAI Binary Comparison Function|Semantic Matching of Functions in LLMs. See keenlab.tencent.com/zh/2023/07/13/2023-BinaryAI-update230713-release/ website, 2023

[124]

Guo D, Zhu Q, Yang D, Xie Z, Dong K, Zhang W, Chen G, Bi X, Wu Y, Li Y K, Luo F, Xiong Y, Liang W. DeepSeek-coder: when the large language model meets programming - the rise of code intelligence. 2024, arXiv preprint arXiv: 2401.14196

[125]

Ullah S, Han M, Pujar S, Pearce H, Coskun A , Stringhini G. Can large language models identify and reason about security vulnerabilities? not yet. arXiv preprint arXiv: 2312.12575v1

[126]

National Institute of Standards and Technology. Test suites. See samate.nist.gov/SARD/test-suites website, 2024

[127]

He Y, Zhang L, Yang Z, Cao Y, Lian K, Li S, Yang W, Zhang Z, Yang M, Zhang Y, Duan H. TextExerciser: feedback-driven text input exercising for android applications. In: Proceedings of 2020 IEEE Symposium on Security and Privacy (SP). 2020, 1071−1087

RIGHTS & PERMISSIONS

The Author(s) 2025. This article is published with open access at link.springer.com and journal.hep.com.cn

AI Summary AI Mindmap
PDF (2168KB)

Supplementary files

Highlights

514

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/