Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference

Wenjie Qiu , Yi-Chen Li , Xuqin Zhang , Tianyi Zhang , Yihang Zhang , Zongzhang Zhang , Yang Yu

Front. Comput. Sci. ››

PDF (396KB)
Front. Comput. Sci. ›› DOI: 10.1007/s11704-026-51483-4
LETTER
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Author information +
History +
PDF (396KB)

Cite this article

Download citation ▾
Wenjie Qiu, Yi-Chen Li, Xuqin Zhang, Tianyi Zhang, Yihang Zhang, Zongzhang Zhang, Yang Yu. Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference. Front. Comput. Sci. DOI:10.1007/s11704-026-51483-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

Higher Education Press 2026

PDF (396KB)

45

Accesses

0

Citation

Detail

Sections
Recommended

/