RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
Peisong Wang, Ruotian Ma, Bang Zhang, Xingyu Chen, Zhiwei He, Kang Luo, Qingsong Lv, Qingxuan Jiang, Zheng Xie, Shanyi Wang, Yuan Li, Fanghua Ye, Jian Li, Yifan Yang, Zhaopeng Tu, Xiaolong Li
2025-07-09
Summary
This paper talks about RLVER, a new way to train AI models to be more emotionally intelligent by using reinforcement learning with rewards based on how well the AI's responses change a simulated user's emotions in a conversation.
What's the problem?
The problem is that while AI can be good at logical tasks, it struggles to understand and respond with real empathy because existing methods rely on limited data or simple rules, which don't work well in diverse real conversations.
What's the solution?
The researchers created RLVER, which uses simulated users that have detailed personalities and emotions. These simulated users give clear, verifiable emotion scores based on how the AI's responses affect their feelings. The AI learns to respond in ways that improve these emotion scores through an ongoing feedback loop, making its emotional responses more genuine.
Why it matters?
This matters because improving emotional intelligence in AI helps machines interact with people in more understanding and supportive ways, making AI assistants better at counseling, customer service, and social interactions without losing their ability to think logically.
Abstract
RLVER, an end-to-end reinforcement learning framework using verifiable emotion rewards, enhances emotional intelligence in LLMs without sacrificing cognitive abilities.