Building a marine reasoning large model: a method based on structured chain-of-thought fine-tuning and knowledge graph

Yanfei Lin , Zhilin Du , Xuening Sun , Xueyu Li , Cong Liu , Xiaoli Zheng , Enxiao Liu , Mukai Chen , Xiao Liu , Huijun Xuan , Muqi Luo , Yuzhen Wang , Zhi Gong , Ruomei Wang

Intelligent Marine Technology and Systems ›› 2026, Vol. 4 ›› Issue (1) : 4

PDF
Intelligent Marine Technology and Systems ›› 2026, Vol. 4 ›› Issue (1) :4 DOI: 10.1007/s44295-025-00091-2
Research Paper
research-article

Building a marine reasoning large model: a method based on structured chain-of-thought fine-tuning and knowledge graph

Author information +
History +
PDF

Abstract

To address the long-standing professional knowledge bottlenecks in scientific marine research and aquaculture, this paper proposes a marine reasoning large language model construction framework based on structured reasoning chain-of-thought (SRCoT) fine-tuning and a knowledge graph (KG). To implement the framework, an indent-driven article heuristic search method is first adopted to construct a marine-domain-specific dataset, followed by the development of a sliding window and weight-matrix-based strategy for dataset deduplication. Subsequently, a marine-domain KG is constructed, and an entity entailment method based on pointwise mutual information vectors is designed. Finally, a model post-training approach integrating SRCoT and three-stage direct preference optimization (DPO) is proposed. The base model is fine-tuned on the marine-domain SRCoT dataset and post-trained using the three-stage DPO strategy. During deployment, the custom-built marine-domain KG is used as an external reference to enhance the model responses. The experimental results demonstrate that the model trained with the proposed framework achieves performance improvements in marine-domain complex reasoning tasks and is effective in mitigating over-reasoning and refining model responses.

Keywords

Large language model / Knowledge graph / Chain-of-thought / Data cleaning / Marine reasoning model / Direct preference optimization

Cite this article

Download citation ▾
Yanfei Lin, Zhilin Du, Xuening Sun, Xueyu Li, Cong Liu, Xiaoli Zheng, Enxiao Liu, Mukai Chen, Xiao Liu, Huijun Xuan, Muqi Luo, Yuzhen Wang, Zhi Gong, Ruomei Wang. Building a marine reasoning large model: a method based on structured chain-of-thought fine-tuning and knowledge graph. Intelligent Marine Technology and Systems, 2026, 4(1): 4 DOI:10.1007/s44295-025-00091-2

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL et al (2023) GPT-4 technical report. Preprint at arXiv:2303.08774

[2]

Athukoralage D, Atapattu T (2024) Multi-stage QLoRA with augmented structured dialogue corpora: efficient and improved conversational healthcare AI. In: NeurIPS 2024 Third Table Representation Learning Workshop. NeurIPS, pp 1–17. https://openreview.net/forum?id=RPx7L1uMYq

[3]

Bhushan K, Nandwani Y, Khandelwal D, Gupta S, Pandey G, Raghu D et al (2025) Systematic knowledge injection into large language models via diverse augmentation for domain-specific RAG. In: Chiruzzo L et al (eds) Findings of the association for computational linguistics. Association for Computational Linguistics, Albuquerque, pp 5922–5943. https://doi.org/10.18653/v1/2025.findings-naacl.329

[4]

Bi Z, Zhang N, Xue Y, Ou Y, Ji D, Zheng G et al (2024) OceanGPT: a large language model for ocean science tasks. Preprint at arXiv:2310.02031

[5]

Chen W, Yin M, Ku M, Lu P, Wan Y, Ma X et al (2023) TheoremQA: a theorem-driven question answering dataset. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, pp 7889–7901. https://doi.org/10.18653/v1/2023.emnlp-main.489

[6]

Cheng F, Li H, Liu F, van Rooij R, Zhang K, Lin Z (2025) Empowering LLMs with logical reasoning: a comprehensive survey. Preprint at arXiv:2502.15652

[7]

Clark P, Cowhey I, Etzioni O, Khot T, Sabharwal A, Schoenick C et al (2018) Think you have solved question answering? Try ARC, the AI2 reasoning challenge. Preprint at arXiv:1803.05457

[8]

Cui H, Shamsi Z, Cheon G, Ma X, Li S, Tikhanovskaya M et al (2025) CURIE: evaluating LLMs on multitask scientific long-context understanding and reasoning. In: The Thirteenth International Conference on Learning Representations. ICLR, pp 1–48. https://openreview.net/forum?id=jw2fC6REUB

[9]

Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L (2023) QLoRA: efficient finetuning of quantized LLMs. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 10088–10115

[10]

Gómez S, Köpsel V. Transdisciplinary marine research: bridging science and society, 2022, London, Routledge6381

[11]

Hendrycks D, Burns C, Kadavath S, Arora A, Basart S, Tang E et al (2021) Measuring mathematical problem solving with the MATH dataset. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). NeurIPS, pp 1–22

[12]

Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A et al (2019) Parameter-efficient transfer learning for NLP. In: Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR, pp 2790–2799. https://proceedings.mlr.press/v97/houlsby19a.html

[13]

Hu EJ, Shen YL, Wallis P, Allen-Zhu Z, Li Y, Wang S et al (2022) LoRA: low-rank adaptation of large language models. In: International Conference on Learning Representations. ICLR, pp 1–13. https://openreview.net/forum?id=nZeVKeeFYf9

[14]

Kwon W, Li Z, Zhuang S, Sheng Y, Zheng L, Yu CH et al (2023) Efficient memory management for large language model serving with pagedattention. In: Proceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23. Association for Computing Machinery, pp 611–626. https://doi.org/10.1145/3600006.3613165

[15]

Li X, Liu B, Zheng G, Ren Y, Zhang S, Liu Yet al.. Deep-learning-based information mining from ocean remote-sensing imagery. Natl Sci Rev, 2020, 7(10): 1584-1605

[16]

Li Y, Yu Y, Liang C, Karampatziakis N, He P, Chen W et al (2024) LoftQ: LoRA-fine-tuning-aware quantization for large language models. In: The Twelfth International Conference on Learning Representations. ICLR, pp 1–16. https://openreview.net/forum?id=LzPWWPAdY4

[17]

Ling C, Zhao X, Lu J, Deng C, Zheng C, Wang Jet al.. Domain specialization as the key to make large language models disruptive: a comprehensive survey. ACM Comput Surv, 2023, 58(3): 1-39

[18]

Liu A, Feng B, Xue B, Wang B, Wu B, Lu C et al (2024) DeepSeek-V3 technical report. Preprint at arXiv:2412.19437

[19]

Lu W, Luu RK, Buehler MJ. Fine-tuning large language models for domain adaptation: exploration of training strategies, scaling, model merging and synergistic capabilities. npj Comput Mater, 2025, 11(1): 84

[20]

Narisetty P, Kattamanchi UKR, Nimma LA, Karnati SRK, Kore SNB, Golamari M et al (2025) AQUA: a large language model for aquaculture & fisheries. Preprint at arXiv:2507.20520

[21]

Niu W, Xie Z, Sun Y, He W, Xu M, Hao C (2025) LangTime: a language-guided unified model for time series forecasting with proximal policy optimization. In: Forty-second International Conference on Machine Learning. ICML, No. 9085. https://openreview.net/forum?id=VfoKOD65Zq

[22]

Nye M, Andreassen AJ, Gur-Ari G, Michalewski H, Austin J, Bieber D et al (2022) Show your work: scratchpads for intermediate computation with language models. Preprint at arXiv:2112.00114v1

[23]

Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P et al (2022) Training language models to follow instructions with human feedback. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf

[24]

Parthasarathy VB, Zafar A, Khan A, Shahid A (2024) The ultimate guide to fine-tuning LLMs from basics to breakthroughs: an exhaustive review of technologies, research, best practices, applied research challenges and opportunities. Preprint at arXiv:2408.13296

[25]

Pfeiffer J, Kamath A, Rücklé A, Cho K, Gurevych I (2021) AdapterFusion: non-destructive task composition for transfer learning. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, pp 487–503. https://doi.org/10.18653/v1/2021.eacl-main.39

[26]

Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C (2023) Direct preference optimization: your language model is secretly a reward model. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 53728–53741. https://aclanthology.org/2021.eacl-main.39.pdf

[27]

Rein D, Hou BL, Stickland AC, Petty J, Pang RY, Dirani J et al (2024) GPQA: a graduate-level google-proof Q & A benchmark. Preprint at arXiv:2311.12022

[28]

Ren Y, Sutherland DJ (2025) Learning dynamics of LLM finetuning. Preprint at arXiv:2407.10490v4

[29]

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint at arXiv:1707.06347

[30]

Shahriar MS, de Souza P, Timms G (2011) Pattern mining for query answering in marine sensor data. In: 2011 Sixth International Conference on Digital Information Management. IEEE, pp 288–293. https://doi.org/10.1109/ICDIM.2011.6093319

[31]

Shao Z, Wang P, Zhu Q, Xu R, Song J, Bi X et al (2024) DeepSeekMath: pushing the limits of mathematical reasoning in open language models. Preprint at arXiv:2402.03300

[32]

Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HWet al.. Large language models encode clinical knowledge. Nature, 2023, 6207972172-180

[33]

Team G, Anil R, Borgeaud S, Alayrac JB, Yu J, Soricut R et al (2023) Gemini: a family of highly capable multimodal models. Preprint at arXiv:2312.11805

[34]

Verma G, Zhou J, Chandra M, Kumar S, Choudhury MD (2025) A framework for situating innovations, opportunities, and challenges in advancing vertical systems with large AI models. Preprint at arXiv:2504.02793

[35]

Wang F, Li X, Tang X, Sun X, Zhang J, Yang D et al (2023a) The seas around china in a warming climate. Nat Rev Earth Environ 4(8):535–551. https://doi.org/10.1038/s43017-023-00453-6

[36]

Wang L, Xu W, Lan Y, Hu Z, Lan Y, Lee RKW et al (2023b) Plan-and-solve prompting: improving zero-shot chain-of-thought reasoning by large language models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 2609–2634. https://doi.org/10.18653/v1/2023.acl-long.147

[37]

Wang Q, Sun X, Lin S, Dong Y, Shen H, He Z et al (2025a) Large-scale seaweed cultivation as a nature solution for carbon-negative economy and restorative environmental stewardship: lessons from China. Renew Sust Energ Rev 207:114954. https://doi.org/10.1016/j.rser.2024.114954

[38]

Wang X, Isazawa T, Mikaelyan L, Hensman J (2025b) KBLaM: knowledge base augmented language model. In: The Thirteenth International Conference on Learning Representations. ICLR, pp 1–30. https://openreview.net/forum?id=aLsMzkTej9

[39]

Wang X, Wei J, Schuurmans D, Le QV, Chi EH, Narang S et al (2023c) Self-consistency improves chain of thought reasoning in language models. In: The Eleventh International Conference on Learning Representations. ICLR, pp 1–24. https://openreview.net/forum?id=1PL1NIMMrw

[40]

Wang Y, Kordi Y, Mishra S, Liu A, Smith NA, Khashabi D et al (2023d) Self-instruct: aligning language models with self-generated instructions. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 13484–13508. https://doi.org/10.18653/v1/2023.acl-long.754

[41]

Wei J, Bosma M, Zhao V, Guu K, Yu AW, Lester B et al (2022a) Finetuned language models are zero-shot learners. Preprint at arXiv:2109.01652

[42]

Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F et al (2022b) Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 24824–24837. https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf

[43]

Yang A, Li A, Yang B, Zhang B, Hui B, Zheng B et al (2025a) Qwen3 technical report. Preprint at arXiv:2505.09388

[44]

Yang H, Wang J, Cao J, Li W, Zheng J, Li Y et al (2025b) OKG-LLM: aligning ocean knowledge graph with observation data via LLMs for global sea surface temperature prediction. Preprint at arXiv:2508.00933

[45]

Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan KR et al (2023) ReAct: synergizing reasoning and acting in language models. In: The Eleventh International Conference on Learning Representations. ICLR, pp 1–31. https://openreview.net/forum?id=WE_vluYUL-X

[46]

Zhang S, Dong L, Li X, Zhang S, Sun X, Wang S et al (2023a) Instruction tuning for large language models: a survey. Preprint at arXiv:2308.10792

[47]

Zhang Z, Chen W, Cheng H, Li Z, Li S, Lin L et al (2022) Divide and contrast: source-free domain adaptation via adaptive contrastive learning. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 5137–5149. https://proceedings.neurips.cc/paper_files/paper/2022/file/215aeb07b5996c969c0123c3c6ee8f54-Paper-Conference.pdf

[48]

Zhang Z, Zhang A, Li M, Smola A (2023b) Automatic chain of thought prompting in large language models. In: The Eleventh International Conference on Learning Representations. ICLR, pp 1–32. https://openreview.net/forum?id=5NTt8GFjUHkr

[49]

Zheng Z, Zhang J, Vu TA, Diao S, Tim YHW, Yeung SK (2023) MarineGPT: unlocking secrets of ocean to the public. Preprint at arXiv:2310.13596

[50]

Zhong Y, Luo P, Yan Y, Jia T, Qi D. PowerGPT: a multimodal foundation model for power inspection. Appl Soft Comput, 2025, 186: 113939

[51]

Zhou D, Schärli N, Hou L, Wei J, Scales N, Wang X et al (2023) Least-to-most prompting enables complex reasoning in large language models. In: The Eleventh International Conference on Learning Representations. ICLR, pp 1–61. https://openreview.net/forum?id=WZH7099tgfM

[52]

Zhu R, Liu B, Zhang R, Zhang S, Cao J. OEQA: knowledge-and intention-driven intelligent ocean engineering question-answering framework. Appl Sci-Basel, 2023, 132312915

RIGHTS & PERMISSIONS

The Author(s)

PDF

5

Accesses

0

Citation

Detail

Sections
Recommended

/