Exploiting large language model with reinforcement learning for generative job recommendations

Zhi ZHENG; Zhao-Peng QIU; Chen ZHU; Xiao HU; Li-Kang WU; Yang SONG; Heng-Shu ZHU; Hui XIONG

doi:10.1007/s11704-025-40843-1

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (1) : 2001303 DOI: 10.1007/s11704-025-40843-1

Artificial Intelligence

RESEARCH ARTICLE

Exploiting large language model with reinforcement learning for generative job recommendations

Zhi ZHENG ¹^,²^,^‡
, Zhao-Peng QIU ²^,^‡
, Chen ZHU ³
, Xiao HU ²
, Li-Kang WU ⁴
, Yang SONG ⁵
, Heng-Shu ZHU ²^,^†
, Hui XIONG ⁶^,⁷^,^†

Author information +

History +

PDF (2784KB)

Abstract

With the rapid development of Large Language Models (LLMs), an increasing number of researchers are turning their attention to Generative Recommender Systems (GRSs), which are not constrained by strict candidate sets and are more conducive to exploring user interests. Existing LLM-based GRSs mainly utilize Supervised Fine-Tuning (SFT) to endow LLMs with the capability to generate candidate items, and further employ similarity-based grounding methods to map the generated results to real-world items. However, SFT-based training methods are insufficient for LLMs to adequately grasp the knowledge embedded in complex interactive behaviors, and similarity-based grounding methods also face challenges for long text matching. Therefore, in this paper, we propose generative job recommendation based on large language models (GIRL). Specifically, we propose to train a model which can evaluate the matching degree between curriculum vitae (CV) and job description (JD) as a reward model, and we use a proximal policy optimization (PPO)-based reinforcement learning (RL) method to fine-tune the LLM-based recommender. Moreover, we propose a model-based grounding method for JD grounding. Extensive experiments on two real-world datasets demonstrate the superiority of the proposed model compared to seven baseline methods.

Graphical abstract

Keywords

large language model / online recruitment / reinforcement learning

Cite this article

Download citation ▾

Zhi ZHENG, Zhao-Peng QIU, Chen ZHU, Xiao HU, Li-Kang WU, Yang SONG, Heng-Shu ZHU, Hui XIONG. Exploiting large language model with reinforcement learning for generative job recommendations. Front. Comput. Sci., 2026, 20(1): 2001303 DOI:10.1007/s11704-025-40843-1

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Wu L, Zheng Z, Qiu Z, Wang H, Gu H, Shen T, Qin C, Zhu C, Zhu H, Liu Q, Xiong H, Chen E . A survey on large language models for recommendation. World Wide Web, 2024, 27( 5): 60

[2]	Bao K, Zhang J, Wang W, Zhang Y, Yang Z, Luo Y, Chen C, Feng F, Tian Q. A bi-step grounding paradigm for large language models in recommendation systems. ACM Transactions on Recommender Systems, 2025

[3]	Lin X, Wang W, Li Y, Feng F, Ng S K, Chua T S. A multi-facet paradigm to bridge large language model and recommendation. 2023, arXiv preprint arXiv: 2310.06491

[4]	Wang W, Lin X, Feng F, He X, Chua T S. Generative recommendation: towards next-generation recommender paradigm. 2023, arXiv preprint arXiv: 2304.03516

[5]	Qin C, Zhu H, Xu T, Zhu C, Jiang L, Chen E, Xiong H. Enhancing person-job fit for talent recruitment: An ability-aware neural network approach. In: Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018, 25−34

[6]	Zhu C, Zhu H, Xiong H, Ma C, Xie F, Ding P, Li P . Person-job fit: adapting the right talent for the right job with joint representation learning. ACM Transactions on Management Information Systems (TMIS), 2018, 9( 3): 12

[7]	Le R, Hu W, Song Y, Zhang T, Zhao D, Yan R. Towards effective and interpretable person-job fitting. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019, 1883−1892

[8]

Bian S, Zhao W X, Song Y, Zhang T, Wen J R. Domain adaptation for person-job fit with transferable deep global match network. In: Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 4810−4820

[9]	Yang C, Hou Y, Song Y, Zhang T, Wen J R, Zhao W X. Modeling two-way selection preference for person-job fit. In: Proceedings of the 16th ACM Conference on Recommender Systems. 2022, 102−112

[10]	Fu B, Liu H, Zhu Y, Song Y, Zhang T, Wu Z. Beyond matching: Modeling two-sided multi-behavioral sequences for dynamic person-job fit. In: Proceedings of the 26th International Conference on Database Systems for Advanced Applications. 2021, 359−375

[11]	Yan R, Le R, Song Y, Zhang T, Zhang X, Zhao D. Interview choice reveals your preference on the market: To improve job-resume matching through profiling memories. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 914−922

[12]	Wang Z, Wei W, Xu C, Xu J, Mao X L . Person-job fit estimation from candidate profile and related recruitment history with co-attention neural networks. Neurocomputing, 2022, 501: 14–24

[13]	Min B, Ross H, Sulem E, Veyseh A P B, Nguyen T H, Sainz O, Agirre E, Heintz I, Roth D . Recent advances in natural language processing via large pre-trained language models: a survey. ACM Computing Surveys, 2024, 56( 2): 30

[14]	Zhao W X, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z, Du Y, Yang C, Chen Y, Chen Z, Jiang J, Ren R, Li Y, Tang X, Liu Z, Liu P, Nie J Y, Wen J R. A survey of large language models. 2023, arXiv preprint arXiv: 2303.18223

[15]	Qiu Z, Wu X, Gao J, Fan W. U-BERT: Pre-training user representations for improved recommendation. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 4320−4327

[16]	Sun F, Liu J, Wu J, Pei C, Lin X, Ou W, Jiang P. BERT4Rec: sequential recommendation with bidirectional encoder representations from transformer. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019, 1441−1450

[17]	Liu J, Liu C, Zhou P, Lv R, Zhou K, Zhang Y. Is ChatGPT a good recommender? A preliminary study. 2023, arXiv preprint arXiv: 2304.10149

[18]	Bao K, Zhang J, Zhang Y, Wang W, Feng F, He X. TALLRec: An effective and efficient tuning framework to align large language model with recommendation. In: Proceedings of the 17th ACM Conference on Recommender Systems. 2023, 1007−1014

[19]

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C L, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2011

[20]	Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv: 1707.06347

[21]	Devlin J, Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019, 4171−4186

[22]	Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J. Product-based neural networks for user response prediction. In: Proceedings of the 16th IEEE International Conference on Data Mining (ICDM). 2016, 1149−1154

[23]	Ji Y, Deng Y, Gong Y, Peng Y, Niu Q, Zhang L, Ma B, Li X. Belle: Be everyone’s large language model engine. See github.com/LianjiaTech/BELLE website, 2023

[24]	Ji Y, Deng Y, Gong Y, Peng Y, Niu Q, Zhang L, Ma B, Li X. Exploring the impact of instruction data scaling on large language models: an empirical study on real-world use cases. 2023, arXiv preprint arXiv: 2303.14742

[25]

Muennighoff N, Wang T, Sutawika L, Roberts A, Biderman S, Le Scao T, Bari M S, Shen S, Yong Z X, Schoelkopf H, Tang X, Radev D, Aji A F, Almubarak K, Albanie S, Alyafeai Z, Webson A, Raff E, Raffel C. Crosslingual generalization through multitask finetuning. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 2023, 15991−16111

[26]	Bian Z, Liu H, Wang B, Huang H, Li Y, Wang C, Cui F, You Y. Colossal-AI: a unified deep learning system for large-scale parallel training. 2021, arXiv preprint arXiv: 2110.14883

[27]	Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. Lora: Low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[28]	Cui Y, Che W, Liu T, Qin B, Wang S, Hu G. Revisiting pre-trained models for Chinese natural language processing. In: Proceedings of the Findings of the Association for Computational Linguistics. 2020, 657−668

[29]	Cui Y, Che W, Liu T, Qin B, Yang Z . Pre-training with whole word masking for Chinese BERT. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504–3514

[30]	He X, Liao L, Zhang H, Nie L, Hu X, Chua T S. Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web. 2017, 173−182

[31]	Kang W C, Ni J, Mehta N, Sathiamoorthy M, Hong L, Chi E, Cheng D Z. Do LLMs understand user preferences? Evaluating LLMs on user rating prediction. 2023, arXiv preprint arXiv: 2305.06474

[32]	Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M A, Lacroix T, Roziere B, Goyal N, Hambro E, Azhar F, Rodriguez A, Joulin A, Grave E, Lample G. LLaMA: open and efficient foundation language models. 2023, arXiv preprint arXiv: 2302.13971