Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning

Haozhen Zhang, Tao Feng, Jiaxuan You

2025-06-18

Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via
Reinforcement Learning

Summary

This paper talks about Router-R1, a reinforcement learning system that helps large language models decide how to use multiple other models together by thinking and routing tasks in multiple steps.

What's the problem?

The problem is that most current systems only assign one task to one model at a time without thinking if multiple models together could do better, limiting their ability to handle complex questions that need many skills.

What's the solution?

The researchers made Router-R1 that acts like a smart router using reinforcement learning. It thinks internally and decides when and which models to ask for help in several rounds, combining answers step-by-step to get better results while also balancing speed and cost.

Why it matters?

This matters because using multiple models together intelligently lets AI handle harder problems more effectively and efficiently, making AI systems smarter and more useful in real-world applications.

Abstract

Router-R1, a reinforcement learning-based framework, improves multi-LLM routing by interleaving think and route actions, optimizing performance-cost trade-offs, and generalizing to unseen models.

View Paper