Building a marine reasoning large model: a method based on structured chain-of-thought fine-tuning and knowledge graph
Yanfei Lin , Zhilin Du , Xuening Sun , Xueyu Li , Cong Liu , Xiaoli Zheng , Enxiao Liu , Mukai Chen , Xiao Liu , Huijun Xuan , Muqi Luo , Yuzhen Wang , Zhi Gong , Ruomei Wang
Intelligent Marine Technology and Systems ›› 2026, Vol. 4 ›› Issue (1) : 4
Building a marine reasoning large model: a method based on structured chain-of-thought fine-tuning and knowledge graph
To address the long-standing professional knowledge bottlenecks in scientific marine research and aquaculture, this paper proposes a marine reasoning large language model construction framework based on structured reasoning chain-of-thought (SRCoT) fine-tuning and a knowledge graph (KG). To implement the framework, an indent-driven article heuristic search method is first adopted to construct a marine-domain-specific dataset, followed by the development of a sliding window and weight-matrix-based strategy for dataset deduplication. Subsequently, a marine-domain KG is constructed, and an entity entailment method based on pointwise mutual information vectors is designed. Finally, a model post-training approach integrating SRCoT and three-stage direct preference optimization (DPO) is proposed. The base model is fine-tuned on the marine-domain SRCoT dataset and post-trained using the three-stage DPO strategy. During deployment, the custom-built marine-domain KG is used as an external reference to enhance the model responses. The experimental results demonstrate that the model trained with the proposed framework achieves performance improvements in marine-domain complex reasoning tasks and is effective in mitigating over-reasoning and refining model responses.
Large language model / Knowledge graph / Chain-of-thought / Data cleaning / Marine reasoning model / Direct preference optimization
| [1] |
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL et al (2023) GPT-4 technical report. Preprint at arXiv:2303.08774 |
| [2] |
Athukoralage D, Atapattu T (2024) Multi-stage QLoRA with augmented structured dialogue corpora: efficient and improved conversational healthcare AI. In: NeurIPS 2024 Third Table Representation Learning Workshop. NeurIPS, pp 1–17. https://openreview.net/forum?id=RPx7L1uMYq |
| [3] |
Bhushan K, Nandwani Y, Khandelwal D, Gupta S, Pandey G, Raghu D et al (2025) Systematic knowledge injection into large language models via diverse augmentation for domain-specific RAG. In: Chiruzzo L et al (eds) Findings of the association for computational linguistics. Association for Computational Linguistics, Albuquerque, pp 5922–5943. https://doi.org/10.18653/v1/2025.findings-naacl.329 |
| [4] |
Bi Z, Zhang N, Xue Y, Ou Y, Ji D, Zheng G et al (2024) OceanGPT: a large language model for ocean science tasks. Preprint at arXiv:2310.02031 |
| [5] |
Chen W, Yin M, Ku M, Lu P, Wan Y, Ma X et al (2023) TheoremQA: a theorem-driven question answering dataset. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, pp 7889–7901. https://doi.org/10.18653/v1/2023.emnlp-main.489 |
| [6] |
Cheng F, Li H, Liu F, van Rooij R, Zhang K, Lin Z (2025) Empowering LLMs with logical reasoning: a comprehensive survey. Preprint at arXiv:2502.15652 |
| [7] |
Clark P, Cowhey I, Etzioni O, Khot T, Sabharwal A, Schoenick C et al (2018) Think you have solved question answering? Try ARC, the AI2 reasoning challenge. Preprint at arXiv:1803.05457 |
| [8] |
Cui H, Shamsi Z, Cheon G, Ma X, Li S, Tikhanovskaya M et al (2025) CURIE: evaluating LLMs on multitask scientific long-context understanding and reasoning. In: The Thirteenth International Conference on Learning Representations. ICLR, pp 1–48. https://openreview.net/forum?id=jw2fC6REUB |
| [9] |
Dettmers T, Pagnoni A, Holtzman A, Zettlemoyer L (2023) QLoRA: efficient finetuning of quantized LLMs. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 10088–10115 |
| [10] |
|
| [11] |
Hendrycks D, Burns C, Kadavath S, Arora A, Basart S, Tang E et al (2021) Measuring mathematical problem solving with the MATH dataset. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). NeurIPS, pp 1–22 |
| [12] |
Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A et al (2019) Parameter-efficient transfer learning for NLP. In: Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR, pp 2790–2799. https://proceedings.mlr.press/v97/houlsby19a.html |
| [13] |
Hu EJ, Shen YL, Wallis P, Allen-Zhu Z, Li Y, Wang S et al (2022) LoRA: low-rank adaptation of large language models. In: International Conference on Learning Representations. ICLR, pp 1–13. https://openreview.net/forum?id=nZeVKeeFYf9 |
| [14] |
Kwon W, Li Z, Zhuang S, Sheng Y, Zheng L, Yu CH et al (2023) Efficient memory management for large language model serving with pagedattention. In: Proceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23. Association for Computing Machinery, pp 611–626. https://doi.org/10.1145/3600006.3613165 |
| [15] |
|
| [16] |
Li Y, Yu Y, Liang C, Karampatziakis N, He P, Chen W et al (2024) LoftQ: LoRA-fine-tuning-aware quantization for large language models. In: The Twelfth International Conference on Learning Representations. ICLR, pp 1–16. https://openreview.net/forum?id=LzPWWPAdY4 |
| [17] |
|
| [18] |
Liu A, Feng B, Xue B, Wang B, Wu B, Lu C et al (2024) DeepSeek-V3 technical report. Preprint at arXiv:2412.19437 |
| [19] |
|
| [20] |
Narisetty P, Kattamanchi UKR, Nimma LA, Karnati SRK, Kore SNB, Golamari M et al (2025) AQUA: a large language model for aquaculture & fisheries. Preprint at arXiv:2507.20520 |
| [21] |
Niu W, Xie Z, Sun Y, He W, Xu M, Hao C (2025) LangTime: a language-guided unified model for time series forecasting with proximal policy optimization. In: Forty-second International Conference on Machine Learning. ICML, No. 9085. https://openreview.net/forum?id=VfoKOD65Zq |
| [22] |
Nye M, Andreassen AJ, Gur-Ari G, Michalewski H, Austin J, Bieber D et al (2022) Show your work: scratchpads for intermediate computation with language models. Preprint at arXiv:2112.00114v1 |
| [23] |
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P et al (2022) Training language models to follow instructions with human feedback. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf |
| [24] |
Parthasarathy VB, Zafar A, Khan A, Shahid A (2024) The ultimate guide to fine-tuning LLMs from basics to breakthroughs: an exhaustive review of technologies, research, best practices, applied research challenges and opportunities. Preprint at arXiv:2408.13296 |
| [25] |
Pfeiffer J, Kamath A, Rücklé A, Cho K, Gurevych I (2021) AdapterFusion: non-destructive task composition for transfer learning. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, pp 487–503. https://doi.org/10.18653/v1/2021.eacl-main.39 |
| [26] |
Rafailov R, Sharma A, Mitchell E, Manning CD, Ermon S, Finn C (2023) Direct preference optimization: your language model is secretly a reward model. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 53728–53741. https://aclanthology.org/2021.eacl-main.39.pdf |
| [27] |
Rein D, Hou BL, Stickland AC, Petty J, Pang RY, Dirani J et al (2024) GPQA: a graduate-level google-proof Q & A benchmark. Preprint at arXiv:2311.12022 |
| [28] |
Ren Y, Sutherland DJ (2025) Learning dynamics of LLM finetuning. Preprint at arXiv:2407.10490v4 |
| [29] |
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint at arXiv:1707.06347 |
| [30] |
Shahriar MS, de Souza P, Timms G (2011) Pattern mining for query answering in marine sensor data. In: 2011 Sixth International Conference on Digital Information Management. IEEE, pp 288–293. https://doi.org/10.1109/ICDIM.2011.6093319 |
| [31] |
Shao Z, Wang P, Zhu Q, Xu R, Song J, Bi X et al (2024) DeepSeekMath: pushing the limits of mathematical reasoning in open language models. Preprint at arXiv:2402.03300 |
| [32] |
|
| [33] |
Team G, Anil R, Borgeaud S, Alayrac JB, Yu J, Soricut R et al (2023) Gemini: a family of highly capable multimodal models. Preprint at arXiv:2312.11805 |
| [34] |
Verma G, Zhou J, Chandra M, Kumar S, Choudhury MD (2025) A framework for situating innovations, opportunities, and challenges in advancing vertical systems with large AI models. Preprint at arXiv:2504.02793 |
| [35] |
Wang F, Li X, Tang X, Sun X, Zhang J, Yang D et al (2023a) The seas around china in a warming climate. Nat Rev Earth Environ 4(8):535–551. https://doi.org/10.1038/s43017-023-00453-6 |
| [36] |
Wang L, Xu W, Lan Y, Hu Z, Lan Y, Lee RKW et al (2023b) Plan-and-solve prompting: improving zero-shot chain-of-thought reasoning by large language models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 2609–2634. https://doi.org/10.18653/v1/2023.acl-long.147 |
| [37] |
Wang Q, Sun X, Lin S, Dong Y, Shen H, He Z et al (2025a) Large-scale seaweed cultivation as a nature solution for carbon-negative economy and restorative environmental stewardship: lessons from China. Renew Sust Energ Rev 207:114954. https://doi.org/10.1016/j.rser.2024.114954 |
| [38] |
Wang X, Isazawa T, Mikaelyan L, Hensman J (2025b) KBLaM: knowledge base augmented language model. In: The Thirteenth International Conference on Learning Representations. ICLR, pp 1–30. https://openreview.net/forum?id=aLsMzkTej9 |
| [39] |
Wang X, Wei J, Schuurmans D, Le QV, Chi EH, Narang S et al (2023c) Self-consistency improves chain of thought reasoning in language models. In: The Eleventh International Conference on Learning Representations. ICLR, pp 1–24. https://openreview.net/forum?id=1PL1NIMMrw |
| [40] |
Wang Y, Kordi Y, Mishra S, Liu A, Smith NA, Khashabi D et al (2023d) Self-instruct: aligning language models with self-generated instructions. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 13484–13508. https://doi.org/10.18653/v1/2023.acl-long.754 |
| [41] |
Wei J, Bosma M, Zhao V, Guu K, Yu AW, Lester B et al (2022a) Finetuned language models are zero-shot learners. Preprint at arXiv:2109.01652 |
| [42] |
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F et al (2022b) Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 24824–24837. https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf |
| [43] |
Yang A, Li A, Yang B, Zhang B, Hui B, Zheng B et al (2025a) Qwen3 technical report. Preprint at arXiv:2505.09388 |
| [44] |
Yang H, Wang J, Cao J, Li W, Zheng J, Li Y et al (2025b) OKG-LLM: aligning ocean knowledge graph with observation data via LLMs for global sea surface temperature prediction. Preprint at arXiv:2508.00933 |
| [45] |
Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan KR et al (2023) ReAct: synergizing reasoning and acting in language models. In: The Eleventh International Conference on Learning Representations. ICLR, pp 1–31. https://openreview.net/forum?id=WE_vluYUL-X |
| [46] |
Zhang S, Dong L, Li X, Zhang S, Sun X, Wang S et al (2023a) Instruction tuning for large language models: a survey. Preprint at arXiv:2308.10792 |
| [47] |
Zhang Z, Chen W, Cheng H, Li Z, Li S, Lin L et al (2022) Divide and contrast: source-free domain adaptation via adaptive contrastive learning. In: Advances in Neural Information Processing Systems. Curran Associates Inc., pp 5137–5149. https://proceedings.neurips.cc/paper_files/paper/2022/file/215aeb07b5996c969c0123c3c6ee8f54-Paper-Conference.pdf |
| [48] |
Zhang Z, Zhang A, Li M, Smola A (2023b) Automatic chain of thought prompting in large language models. In: The Eleventh International Conference on Learning Representations. ICLR, pp 1–32. https://openreview.net/forum?id=5NTt8GFjUHkr |
| [49] |
Zheng Z, Zhang J, Vu TA, Diao S, Tim YHW, Yeung SK (2023) MarineGPT: unlocking secrets of ocean to the public. Preprint at arXiv:2310.13596 |
| [50] |
|
| [51] |
Zhou D, Schärli N, Hou L, Wei J, Scales N, Wang X et al (2023) Least-to-most prompting enables complex reasoning in large language models. In: The Eleventh International Conference on Learning Representations. ICLR, pp 1–61. https://openreview.net/forum?id=WZH7099tgfM |
| [52] |
|
The Author(s)
/
| 〈 |
|
〉 |