MOA: Multi-Objective Alignment for Role-Playing Agents

Chonghua Liao, Ke Wang, Yuchuan Wu, Fei Huang, Yongbin Li

2025-12-12

MOA: Multi-Objective Alignment for Role-Playing Agents

Summary

This paper introduces a new method, called MOA, for creating better role-playing agents – AI programs designed to convincingly act as different characters in conversations.

What's the problem?

Currently, building these agents is tricky because they need to be good at a lot of things at once, like understanding instructions, knowing about the role they're playing, and consistently sounding like that character. Existing methods either get stuck copying the style of the training data without being very creative, or they struggle to improve across all these different areas simultaneously using trial and error learning.

What's the solution?

MOA uses a special type of learning called reinforcement learning, but it's designed to handle multiple goals at the same time. Instead of just trying to make the agent generally 'good,' it focuses on improving specific aspects of its performance, like how well it sticks to the character's personality and how accurately it uses knowledge about the role. They also added a technique to help the agent think through its responses and learn from past successful strategies, even if those strategies weren't directly part of its current training.

Why it matters?

This research is important because it shows we can build AI characters that are much more realistic and engaging. The MOA method allows an 8 billion parameter model to perform as well as, or even better than, much larger and more powerful models like GPT-4o and Claude in role-playing scenarios, suggesting a path towards creating high-quality, versatile role-playing agents without needing massive computing resources.

Abstract

Role-playing agents (RPAs) must simultaneously master many conflicting skills -- following multi-turn instructions, exhibiting domain knowledge, and adopting a consistent linguistic style. Existing work either relies on supervised fine-tuning (SFT) that over-fits surface cues and yields low diversity, or applies reinforcement learning (RL) that fails to learn multiple dimensions for comprehensive RPA optimization. We present MOA (Multi-Objective Alignment), a reinforcement-learning framework that enables multi-dimensional, fine-grained rubric optimization for general RPAs. MOA introduces a novel multi-objective optimization strategy that trains simultaneously on multiple fine-grained rubrics to boost optimization performance. Besides, to address the issues of model output diversity and quality, we have also employed thought-augmented rollout with off-policy guidance. Extensive experiments on challenging benchmarks such as PersonaGym and RoleMRC show that MOA enables an 8B model to match or even outperform strong baselines such as GPT-4o and Claude across numerous dimensions. This demonstrates the great potential of MOA in building RPAs that can simultaneously meet the demands of role knowledge, persona style, diverse scenarios, and complex multi-turn conversations.

View Paper