LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities with Two-Stage Rule-Based RL

Yingzhe Peng; Gongrui Zhang; Miaosen Zhang; Zhiyuan You; Jie Liu; Qipeng Zhu; Kai Yang; Xingzhong Xu; Xin Geng; Xu Yang

doi:10.1007/s11704-026-51776-8

Front. Comput. Sci. ›› DOI: 10.1007/s11704-026-51776-8

RESEARCH ARTICLE

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities with Two-Stage Rule-Based RL

Author information +

History +

PDF (12007KB)

Abstract

Large multimodal models (LMMs) have demonstrated strong performance in vision-language tasks, yet their reasoning capabilities remain limited, particularly in smaller models like 3B LMMs, where model capacity restricts reasoning ability. Enhancing the reasoning of LMMs presents two challenges compared to language models: (1) high-quality multimodal reasoning data is scarcer than textual data, and (2) the alignment between vision encoders and language decoders often degrades pretrained language reasoning skills. In this work, we propose a two-phase training framework,LMM-R1, to improve the reasoning abilities of 3B LMMs. The first phase, Foundational Reasoning Enhancement (FRE), restores core language reasoning by training on abundant text-only data through rule-based reinforcement learning (RL). This phase leverages high-quality textual data to acquire reasoning skills without costly multimodalsupervision. The second phase, Multimodal Generalization Training (MGT), extends these skills to the multimodal domain. Our experiments show that FRE outperforms direct RL on multimodal data across various tasks. LMM-R1 achieves a 4.5% and 4.69% average improvement in text-only and multimodal benchmarks, respectively, and a 4.6% improvement in agent-related tasks, comparable to models like GPT-4o. These results highlight the importance of enhancing foundational reasoning for effective multimodal generalization. The code is available at https://github.com/GlowLED/lmm-r1-ascend.

Keywords

large multimodal models / reasoning / reinforcement learning

Cite this article

Download citation ▾

Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang. LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities with Two-Stage Rule-Based RL. Front. Comput. Sci. DOI:10.1007/s11704-026-51776-8

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

RIGHTS & PERMISSIONS

Higher Education Press 2026

PDF (12007KB)

Accesses

Citation

Detail

Sections

Recommended

About the journal

Aims & scope

Description

Editorial board

Abstracting / indexing

Contact us

Browse

Just accepted

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates

Guidelines for reviewers

Abstract

Keywords

Cite this article

References

RIGHTS & PERMISSIONS

Just Accepted