Top Pass: improve code generation by pass@k-maximized code ranking

Zhicun LYU; Xinye LI; Zheng XIE; Ming LI

doi:10.1007/s11704-024-40415-9

Front. Comput. Sci. ›› 2025, Vol. 19 ›› Issue (8) :198341 DOI: 10.1007/s11704-024-40415-9

Artificial Intelligence

RESEARCH ARTICLE

Top Pass: improve code generation by pass@k-maximized code ranking

Zhicun LYU ¹^,²
, Xinye LI ¹^,²
, Zheng XIE ¹
, Ming LI ¹^,²^,^†

Author information +

History +

PDF (1512KB)

Abstract

Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. However, users of code generation systems usually expect to find a correct program by reviewing or testing only a small number of code candidates. Otherwise, the system would be unhelpful. In this paper, we propose Top Pass, a code ranking approach that identifies potential correct solutions from a large number of candidates. Top Pass directly optimizes the pass@k loss function, enhancing the quality at the top of the candidate list. This enables the user to find the correct solution within as few tries as possible. Experimental results on four benchmarks indicate that our Top Pass method enhances the usability of code generation models by producing better ranking results, particularly achieving a 32.9% relative improvement in pass@1 on CodeContests when compared to the state-of-the-art ranking method.

Graphical abstract

Keywords

machine learning / data mining / software engineering

Cite this article

Download citation ▾

Zhicun LYU, Xinye LI, Zheng XIE, Ming LI. Top Pass: improve code generation by pass@k-maximized code ranking. Front. Comput. Sci., 2025, 19(8): 198341 DOI:10.1007/s11704-024-40415-9

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Li R, Allal L B, Zi Y, Muennighoff N, Kocetkov D, , . StarCoder: may the source be with you! 2016, arXiv preprint arXiv: 2305.06161

[2]	Nijkamp E, Pang B, Hayashi H, Tu L, Wang H, Zhou Y, Savarese S, Xiong C. CodeGen: an open large language model for code with multi-turn program synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[3]

Rozière B, Gehring J, Gloeckle F, Sootla S, Gat I, Tan X E, Adi Y, Liu J, Sauvestre R, Remez T, Rapin J, Kozhevnikov A, Evtimov I, Bitton J, Bhatt M, Ferrer C C, Grattafiori A, Xiong W, Défossez A, Copet J, Azhar F, Touvron H, Martin L, Usunier N, Scialom T, Synnaeve G. Code LLaMa: open foundation models for code. 2024, arXiv preprint arXiv: 2308.12950

[4]	Hu Y, Jiang H, Hu Z . Measuring code maintainability with deep neural networks. Frontiers of Computer Science, 2023, 17( 6): 176214

[5]	Chen B, Zhang F, Nguyen A, Zan D, Lin Z, Lou J G, Chen W. CodeT: code generation with generated tests. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[6]	Zhang K, Wang D, Xia J, Wang W Y, Li L. ALGO: synthesizing algorithmic programs with LLM-generated oracle verifiers. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 54769−54784

[7]	Chen M, Tworek J, Jun H, Yuan Q, de Oliveira Pinto H P, , . Evaluating large language models trained on code. 2021, arXiv preprint arXiv: 2107.03374

[8]	Inala J P, Wang C, Yang M, Codas A, Encarnación M, Lahiri S K, Musuvathi M, Gao J. Fault-aware neural code rankers. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 13419−13432

[9]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6000−6010

[10]	Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018

[11]	Black S, Leo G, Wang P, Leahy C, Biderman S. GPT-Neo: large scale autoregressive language modeling with mesh-tensorflow. 2021

[12]	Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, , . GPT-4 technical report. 2024, arXiv preprint arXiv: 2303.08774

[13]	Anil R, Dai A M, Firat O, Johnson M, Lepikhin D, , . PaLM 2 technical report. 2023, arXiv preprint arXiv: 2305.10403

[14]	Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, . . PaLM: scaling language modeling with pathways. The Journal of Machine Learning Research, 2024, 24( 1): 240

[15]	Wang Y, Wang W, Joty S, Hoi S C H. CodeT5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of 2021 Conference on Empirical Methods in Natural Language Processing. 2021, 8696−8708

[16]	Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P J . Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21( 1): 140

[17]

Li Y, Choi D, Chung J, Kushman N, Schrittwieser J, Leblond R, Eccles T, Keeling J, Gimeno F, Dal Lago A, Hubert T, Choy P, de Masson d’Autume C, Babuschkin I, Chen X, Huang P S, Welbl J, Gowal S, Cherepanov A, Molloy J, Mankowitz D J, Sutherland Robson E, Kohli P, de Freitas N, Kavukcuoglu K, Vinyals O . Competition-level code generation with AlphaCode. Science, 2022, 378( 6624): 1092–1097

[18]	Luo Z, Xu C, Zhao P, Sun Q, Geng X, Hu W, Tao C, Ma J, Lin Q, Jiang D. WizardCoder: Empowering code large language models with Evol-Instruct. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[19]	Gunasekar S, Zhang Y, Aneja J, Mendes C C T, Del Giorno A, Gopi S, Javaheripi M, Kauffmann P, de Rosa G, Saarikivi O, Salim A, Shah S, Behl H S, Wang X, Bubeck S, Eldan R, Kalai A T, Lee Y T, Li Y. Textbooks are all you need. 2023, arXiv preprint arXiv: 2306.11644

[20]	Bi X, Chen D, Chen G, Chen S, Dai D, , . DeepSeek LLM: scaling open-source language models with longtermism. 2024, arXiv preprint arXiv: 2401.02954

[21]

Zheng Q, Xia X, Zou X, Dong Y, Wang S, Xue Y, Shen L, Wang Z, Wang A, Li Y, Su T, Yang Z, Tang J. CodeGeeX: a pre-trained model for code generation with multilingual benchmarking on HumanEval-X. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2023, 5673−5684

[22]	Fried D, Aghajanyan A, Lin J, Wang S, Wallace E, Shi F, Zhong R, Yih S, Zettlemoyer L, Lewis M. InCoder: a generative model for code infilling and synthesis. In: Proceedings of the 11th International Conference on Learning Representations. 2023

[23]	Chen X, Lin M, Schärli N, Zhou D. Teaching large language models to self-debug. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[24]	Liu J, Xia C S, Wang Y, Zhang L. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2024, 943

[25]	Deng Y, Xia C S, Peng H, Yang C, Zhang L. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023, 423−435

[26]	Wang W, Li G, Ma B, Xia X, Jin Z. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In: Proceedings of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2020, 261−271

[27]	Gu J, Chen Z, Monperrus M. Multimodal representation for neural code search. In: Proceedings of 2021 IEEE International Conference on Software Maintenance and Evolution. 2021, 483−494

[28]	Arakelyan S, Hakhverdyan A, Allamanis M, Garcia L, Hauser C, Ren X. NS³: neuro-symbolic semantic code search. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2024, 761

[29]	Li Z, Pan M, Pei Y, Zhang T, Wang L, Li X . Empirically revisiting and enhancing automatic classification of bug and non-bug issues. Frontiers of Computer Science, 2024, 18( 5): 185207

[30]	Kanade A, Maniatis P, Balakrishnan G, Shi K. Learning and evaluating contextual embedding of source code. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 474

[31]	Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M. CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020. 2020, 1536−1547

[32]

Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng S K, Clement C B, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M. GraphCodeBERT: pre-training code representations with data flow. In: Proceedings of the 9th International Conference on Learning Representations. 2021

[33]	Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J. UniXcoder: unified cross-modal pre-training for code representation. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. 2022, 7212−7225

[34]	Ahmad W, Chakraborty S, Ray B, Chang K W. Unified pre-training for program understanding and generation. In: Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021, 2655−2668

[35]	Wang X, Wang Y, Mi F, Zhou P, Wan Y, Liu X, Li L, Wu H, Liu J, Jiang X. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. 2021, arXiv preprint arXiv: 2108.04556

[36]	Clark K, Luong M T, Le Q V, Manning C D. Electra: pre-training text encoders as discriminators rather than generators. In: Proceedings of the 8th International Conference on Learning Representations. 2020

[37]	Hendrycks D, Basart S, Kadavath S, Mazeika M, Arora A, Guo E, Burns C, Puranik S, He H, Song D, Steinhardt J. Measuring coding challenge competence with APPS. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021

[38]	Austin J, Odena A, Nye M, Bosma M, Michalewski H, Dohan D, Jiang E, Cai C, Terry M, Le Q, Sutton C. Program synthesis with large language models. 2021, arXiv preprint arXiv: 2108.07732

[39]	OpenAI. ChatGPT: optimizing language models for dialogue. 2022

[40]	Zhang T, Yu T, Hashimoto T B, Lewis M, Yih W T, Fried D, Wang S I. Coder reviewer reranking for code generation. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 41832−41846