LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities with Two-Stage Rule-Based RL

Yingzhe Peng , Gongrui Zhang , Miaosen Zhang , Zhiyuan You , Jie Liu , Qipeng Zhu , Kai Yang , Xingzhong Xu , Xin Geng , Xu Yang

Front. Comput. Sci. ››

PDF (12007KB)
Front. Comput. Sci. ›› DOI: 10.1007/s11704-026-51776-8
RESEARCH ARTICLE
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities with Two-Stage Rule-Based RL
Author information +
History +
PDF (12007KB)

Abstract

Large multimodal models (LMMs) have demonstrated strong performance in vision-language tasks, yet their reasoning capabilities remain limited, particularly in smaller models like 3B LMMs, where model capacity restricts reasoning ability. Enhancing the reasoning of LMMs presents two challenges compared to language models: (1) high-quality multimodal reasoning data is scarcer than textual data, and (2) the alignment between vision encoders and language decoders often degrades pretrained language reasoning skills. In this work, we propose a two-phase training framework,LMM-R1, to improve the reasoning abilities of 3B LMMs. The first phase, Foundational Reasoning Enhancement (FRE), restores core language reasoning by training on abundant text-only data through rule-based reinforcement learning (RL). This phase leverages high-quality textual data to acquire reasoning skills without costly multimodalsupervision. The second phase, Multimodal Generalization Training (MGT), extends these skills to the multimodal domain. Our experiments show that FRE outperforms direct RL on multimodal data across various tasks. LMM-R1 achieves a 4.5% and 4.69% average improvement in text-only and multimodal benchmarks, respectively, and a 4.6% improvement in agent-related tasks, comparable to models like GPT-4o. These results highlight the importance of enhancing foundational reasoning for effective multimodal generalization. The code is available at https://github.com/GlowLED/lmm-r1-ascend.

Keywords

large multimodal models / reasoning / reinforcement learning

Cite this article

Download citation ▾
Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang. LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities with Two-Stage Rule-Based RL. Front. Comput. Sci. DOI:10.1007/s11704-026-51776-8

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

Higher Education Press 2026

PDF (12007KB)

0

Accesses

0

Citation

Detail

Sections
Recommended

/