Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu , Yi-Chen Li , Xuqin Zhang , Tianyi Zhang , Yihang Zhang , Zongzhang Zhang , Yang Yu
Higher Education Press 2026
/
| 〈 |
|
〉 |