DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Yun-Nung Chen
2024-07-02

Summary
This paper talks about a new method called DogeRM that helps improve how AI models understand and respond to human preferences by combining general knowledge with specific knowledge from certain areas, like math or coding.
What's the problem?
The main issue is that training AI models to understand what people want is expensive and takes a lot of time. This is especially true when the AI needs to learn specific preferences that require expert knowledge, making it hard to gather the necessary data for training.
What's the solution?
To solve this problem, the authors created DogeRM, which merges a general reward model (an AI that understands basic human preferences) with a model that has specialized knowledge in certain areas. This merging process allows the reward model to learn from both general and specific data, improving its ability to make accurate predictions without needing as much specialized training data.
Why it matters?
This research is important because it shows a way to make AI models smarter and more efficient. By using domain knowledge, DogeRM can help AI better align with what humans want, especially in specialized fields, which can lead to better performance in tasks like math problem-solving or coding.
Abstract
Reinforcement learning from human feedback (RLHF) is a popular strategy for aligning large language models (LLMs) with desired behaviors. Reward modeling is a crucial step in RLHF. However, collecting paired preference data for training reward models is often costly and time-consuming, especially for domain-specific preferences requiring expert annotation. To address this challenge, we propose the Domain knowledge merged Reward Model (DogeRM), a novel framework that integrates domain-specific knowledge into a general reward model by model merging. The experiments demonstrate that DogeRM enhances performance across different benchmarks and provide a detailed analysis showcasing the effects of model merging, showing the great potential of facilitating model alignment.