GraphInstruct: empowering large language models with graph understanding and reasoning capability

Zihan LUO; Xiran SONG; Hong HUANG; Jianxun LIAN; Chenhao ZHANG; Jinqi JIANG; Xing XIE; Hai JIN

doi:10.1007/s11704-025-51382-0

Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (1) :2101302 DOI: 10.1007/s11704-025-51382-0

Artificial Intelligence

RESEARCH ARTICLE

GraphInstruct: empowering large language models with graph understanding and reasoning capability

Author information +

History +

PDF (6229KB)

Abstract

Improving the general capabilities of Large Language Models (LLMs) is an active research topic. As a common data structure in many real-world domains, understanding graph data is a crucial part of advancing general intelligence. To this end, we propose a dynamic benchmark named GraphInstruct in this paper, which comprehensively includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed intermediate reasoning steps for each sample. Based on GraphInstruct, we develop GraphSolver via efficient instruction-tuning, which demonstrates prominent graph understanding capability compared to other open-sourced LLMs. To further endow LLMs with multi-step graph reasoning capability, we propose a label-mask training strategy and build GraphSolver+, which leverages masked supervision on intermediate reasoning tokens to emphasize crucial node-identification signals. As one of the pioneering efforts to enhance the graph understanding and reasoning abilities of LLMs, extensive experiments have demonstrated the superiority of GraphSolver and GraphSolver+ over other LLMs. We sincerely hope GraphInstruct will facilitate further research on applying LLMs to graph-structured data. Our code and data are released publicly at the website of github.com/CGCL-codes/GraphInstruct.

Graphical abstract

Keywords

LLM / graph reasoning / instruction tuning

Cite this article

Download citation ▾

Zihan LUO, Xiran SONG, Hong HUANG, Jianxun LIAN, Chenhao ZHANG, Jinqi JIANG, Xing XIE, Hai JIN. GraphInstruct: empowering large language models with graph understanding and reasoning capability. Front. Comput. Sci., 2027, 21(1): 2101302 DOI:10.1007/s11704-025-51382-0

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Hou Z, Lv X, Lu R, Zhang J, Li Y, Yao Z, Li J, Tang J, Dong Y. T1: advancing language model reasoning through reinforcement learning and inference scaling. In: Proceedings of the 42nd International Conference on Machine Learning. 2025, 23976–24003

[2]	Luo H, Sun Q, Xu C, Zhao P, Lou J G, Tao C, Geng X, Lin Q, Chen S, Tang Y, Zhang D. WizardMath: empowering mathematical reasoning for large language models via reinforced Evol-Instruct. In: Proceedings of the 13th International Conference on Learning Representations. 2025

[3]	Luo Z, Xu C, Zhao P, Sun Q, Geng X, Hu W, Tao C, Ma J, Lin Q, Jiang D. WizardCoder: empowering code large language models with Evol-Instruct. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[4]	Chan J S, Chowdhury N, Jaffe O, Aung J, Sherburn D, Mays E, Starace G, Liu K, Maksin L, Patwardhan T, Madry A, Weng L. MLE-bench: evaluating machine learning agents on machine learning engineering. In: Proceedings of the 13th International Conference on Learning Representations. 2025

[5]

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C L, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 2011

[6]	Zhang C, Li R, Tan M, Yang M, Zhu J, Yang D, Zhao J, Ye G, Li C, Hu X. CPsyCoun: a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. In: Proceedings of Findings of the Association for Computational Linguistics. 2024, 13947–13966

[7]

Alayrac J B, Donahue J, Luc P, Miech A, Barr I, Hasson Y, Lenc K, Mensch A, Millicah K, Reynolds M, Ring R, Rutherford E, Cabi S, Han T, Gong Z, Samangooei S, Monteiro M, Menick J, Borgeaud S, Brock A, Nematzadeh A, Sharifzadeh S, Binkowski M, Barreira R, Vinyals O, Zisserman A, Simonyan K. Flamingo: a visual language model for few-shot learning. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1723

[8]

Liu Z, Fang F, Feng X, Du X, Zhang C, Wang N, Bai Y, Zhao Q, Fan L, Gan C, Lin H, Li J, Ni Y, Wu H, Narsupalli Y, Zheng Z, Li C, Hu X, Xu R, Chen X, Yang M, Liu J, Liu R, Huang W, Zhang G, Ni S. II-Bench: an image implication understanding benchmark for multimodal large language models. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. 2024, 1474

[9]	Huang R, Li M, Yang D, Shi J, Chang X, Ye Z, Wu Y, Hong Z, Huang J, Liu J, Ren Y, Zou Y, Zhao Z, Watanabe S. AudioGPT: understanding and generating speech, music, sound, and talking head. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 2847

[10]	Song X, Lian J, Huang H, Luo Z, Zhou W, Lin X, Wu M, Li C, Xie X, Jin H. xGCN: an extreme graph convolutional network for large-scale social link prediction. In: Proceedings of the ACM Web Conference 2023. 2023, 349–359

[11]	Cheng D, Zou Y, Xiang S, Jiang C . Graph neural networks for financial fraud detection: a review. Frontiers of Computer Science, 2025, 19( 9): 199609

[12]	Zhang X, Lei X . Predicting miRNA-drug interactions via dual-channel network based on TCN and BiLSTM. Frontiers of Computer Science, 2025, 19( 5): 195905

[13]	He H, Chen G, Chen C Y C . Integrating sequence and graph information for enhanced drug-target affinity prediction. Science China Information Sciences, 2024, 67( 2): 129101

[14]	Zhu K, Chen J, Wang J, Gong N Z, Yang D, Xie X. DyVal: dynamic evaluation of large language models for reasoning tasks. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[15]	Peng M, Chen N, Suo Z, Li J. Rewarding graph reasoning process makes LLMs more generalized reasoners. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2025, 2257–2268

[16]	Zhang Q, Chen N, Li Z, Peng M, Tang J, Li J. Improving LLMs’ generalized reasoning abilities by graph problems. In: Proceedings of the 2nd Conference on Language Modeling. 2025

[17]	Guo J, Du L, Liu H, Zhou M, He X, Han S. GPT4Graph: can large language models understand graph structured data? An empirical evaluation and benchmarking. 2023, arXiv preprint arXiv: 2305.15066

[18]	Wang H, Feng S, He T, Tan Z, Han X, Tsvetkov Y. Can language models solve graph problems in natural language?. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. 2023, 1345

[19]	Tang J, Zhang Q, Li Y, Chen N, Li J. GraphArena: evaluating and exploring large language models on graph computation. In: Proceedings of the 13th International Conference on Learning Representations. 2025

[20]	Fatemi B, Halcrow J, Perozzi B. Talk like a graph: encoding graphs for large language models. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[21]	Chai Z, Zhang T, Wu L, Han K, Hu X, Huang X, Yang Y. GraphLLM: boosting graph reasoning ability of large language model. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[22]	Tang J, Yang Y, Wei W, Shi L, Su L, Cheng S, Yin D, Huang C. GraphGPT: graph instruction tuning for large language models. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2024, 491–500

[23]	Chen N, Li Y, Tang J, Li J. GraphWiz: an instruction-following language model for graph computational problems. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2024, 353–364

[24]	Kojima T, Gu S S, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1613

[25]	Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, Chi E H, Le Q V, Zhou D. Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 1800

[26]	Xu H, Jian X, Zhao X, Pang W, Zhang C, Wang S, Zhang Q, Dong Z, Monteiro J, Liu B, Sun Q, Yu T. GraphOmni: a comprehensive and extendable benchmark framework for large language models on graph-theoretic tasks. 2025, arXiv preprint arXiv: 2504.12764

[27]	Zhang Q, Hong X, Tang J, Chen N, Li Y, Li W, Tang J, Li J. GCoder: improving large language model for generalized graph reasoning. In: Proceedings of the 34th ACM International Conference on Information and Knowledge Management. 2025, 4149–4159

[28]	Wang R, Liang S, Chen Q, Zhang J, Qin K. GraphTool-Instruction: revolutionizing graph reasoning in LLMs through decomposed subtask instruction. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2025, 1492–1503

[29]	Xu C, Sun Q, Zheng K, Geng X, Zhao P, Feng J, Tao C, Lin Q, Jiang D. WizardLM: empowering large pre-trained language models to follow complex instructions. In: Proceedings of the 12th International Conference on Learning Representations. 2024

[30]

Zhuo T Y, Chien V M, Chim J, Hu H, Yu W, Widyasari R, Yusuf I N B, Zhan H, He J, Paul I, Brunner S, Gong C, Hoang T, Zebaze A R, Hong X, Li W D, Kaddour J, Xu M, Zhang Z, Yadav P, Jain N, Gu A, Cheng Z, Liu J, Liu Q, Wang Z, Hui B, Muennighoff N, Lo D, Fried D, Du X, de Vries H, von Werra L. BigCodeBench: benchmarking code generation with diverse function calls and complex instructions. In: Proceedings of the 13th International Conference on Learning Representations. 2025

[31]	Bao K, Zhang J, Zhang Y, Wang W, Feng F, He X. TALLRec: an effective and efficient tuning framework to align large language model with recommendation. In: Proceedings of the 17th ACM Conference on Recommender Systems. 2023, 1007–1014

[32]	Li Y, Ma S, Wang X, Huang S, Jiang C, Zheng H T, Xie P, Huang F, Jiang Y. EcomGPT: instruction-tuning large language models with chain-of-task tasks for E-commerce. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. 2024, 2072

[33]	Kim Y, Xu X, McDuff D, Breazeal C, Park H W. Health-LLM: large language models for health prediction via wearable sensor data. In: Proceedings of the 5th Conference on Health, Inference, and Learning. 2024, 522–539

[34]	Qiu P, Wu C, Zhang X, Lin W, Wang H, Zhang Y, Wang Y, Xie W . Towards building multilingual language model for medicine. Nature Communications, 2024, 15( 1): 8384

[35]	Yue S, Liu S, Zhou Y, Shen C, Wang S, Xiao Y, Li B, Song Y, Shen X, Chen W, Huang X, Wei Z. LawLLM: intelligent legal system with legal reasoning and verifiable retrieval. In: Proceedings of the 29th International Conference on Database Systems for Advanced Applications. 2024, 304–321

[36]	Barabási A L, Albert R . Emergence of scaling in random networks. Science, 1999, 286( 5439): 509–512

[37]	Watts D J, Strogatz S H . Collective dynamics of ‘small-world’ networks. Nature, 1998, 393( 6684): 440–442

[38]	Jeh G, Widom J. Scaling personalized web search. In: Proceedings of the 12th International Conference on World Wide Web. 2003, 271–279

[39]	Zhang Y, Wang H, Feng S, Tan Z, Han X, He T, Tsvetkov Y. Can LLM graph reasoning generalize beyond pattern memorization?. In: Proceedings of Findings of the Association for Computational Linguistics. 2024, 2289–2305

[40]	Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W. LoRA: low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. 2022

[41]	Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[42]	Zheng Y, Zhang R, Zhang J, Ye Y, Luo Z. LlamaFactory: unified efficient fine-tuning of 100+ language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 400–410

[43]	Kwon W, Li Z, Zhuang S, Sheng Y, Zheng L, Yu C H, Gonzalez J, Zhang H, Stoica I. Efficient memory management for large language model serving with PagedAttention. In: Proceedings of the 29th Symposium on Operating Systems Principles. 2023, 611–626

[44]	Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017

[45]	Hamilton W L, Ying Z, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 1025–1035

[46]	Chen C, Wang X, Lin T E, Lv A, Wu Y, Gao X, Wen J R, Yan R, Li Y. Masked thought: simply masking partial reasoning steps can improve mathematical reasoning learning of language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 5872–5900