Mixture of Horizons in Action Chunking

Dong Jing, Gang Wang, Jiaqi Liu, Weiliang Tang, Zelong Sun, Yunchao Yao, Zhenyu Wei, Yunhui Liu, Zhiwu Lu, Mingyu Ding

2025-12-03

Summary

This paper focuses on improving how robots learn to perform tasks by watching videos and understanding instructions, specifically addressing a challenge with how the robot plans its actions over time.

What's the problem?

When training these 'vision-language-action' models, researchers found a tricky balance. If the robot plans too far ahead (long horizon), it gets the big picture but messes up the small details. If it plans only for the immediate future (short horizon), it does the small things well but can't handle tasks that require multiple steps. Choosing a single planning length isn't ideal because it always sacrifices something.

What's the solution?

The researchers came up with a 'mixture of horizons' approach. Essentially, the robot now plans using multiple different lengths simultaneously. It breaks down the action plan into segments, each with its own horizon, processes them all at once, and then combines the results. This allows the robot to benefit from both long-term foresight and short-term precision without being limited to just one or the other. It also allows the robot to dynamically choose the most stable actions by looking for agreement across these different planning lengths, speeding up the process.

Why it matters?

This is important because it makes robots much better at complex tasks. By overcoming the limitations of fixed planning lengths, the robot can learn more effectively and perform tasks with a higher success rate, even in real-world situations. The method is also easy to add to existing robot learning systems and significantly improves how quickly and reliably robots can learn new skills.

Abstract

Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the action chunk length used during training, termed horizon. Our empirical study reveals an inherent trade-off: longer horizons provide stronger global foresight but degrade fine-grained accuracy, while shorter ones sharpen local control yet struggle on long-term tasks, implying fixed choice of single horizons being suboptimal. To mitigate the trade-off, we propose a mixture of horizons (MoH) strategy. MoH rearranges the action chunk into several segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs with a light linear gate. It has three appealing benefits. 1) MoH exploits long-term foresight and short-term precision jointly within a single model, improving both performance and generalizability to complex tasks. 2) MoH is plug-and-play for full-attention action modules with minimal training or inference overhead. 3) MoH enables dynamic inference with adaptive horizons, which selects stable actions through cross-horizon consensus, achieving 2.5times higher throughput than baselines while preserving superior performance. Extensive experiments over flow-based policies π_0, π_{0.5}, and one-step regression policy π_{reg} demonstrate that MoH yields consistent and significant gains on both simulations and real-world tasks. Notably, under mixed-task setting, π_{0.5} with MoH reaches a new state-of-the-art with 99% average success rate on LIBERO after only 30k training iterations. Project page: https://github.com/Timsty1/MixtureOfHorizons

View Paper