ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping

Shuang Chen, Yue Guo, Yimeng Ye, Shijue Huang, Wenbo Hu, Haoxi Li, Manyuan Zhang, Jiayu Chen, Song Guo, Nanyun Peng

2025-10-13

ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping

Summary

This paper introduces ARES, a new system designed to make large reasoning models more efficient and accurate when solving problems that involve both text and images.

What's the problem?

Current large reasoning models often struggle with finding the right balance when tackling different problems. They tend to get bogged down in unnecessary detail on easy questions, creating long and complicated explanations, but don't explore enough possibilities on hard questions, leading to incorrect answers. Essentially, they either 'overthink' simple things or 'underthink' complex ones.

What's the solution?

The researchers developed ARES, which learns to adjust how much effort it puts into solving a problem based on how difficult it seems. They found that looking at how unpredictable the model's choices are within a small section of text (called 'window-entropy') can indicate when the model is at a critical point in its reasoning. ARES uses this information to decide when to explore more options. It's trained in two steps: first, it learns to associate problem difficulty with the length of reasoning needed, and second, it learns to use window-entropy to control how much exploration is done during problem-solving.

Why it matters?

This work is important because it makes these powerful reasoning models more practical. By making them more efficient, ARES reduces the computing power needed to get accurate answers, and it improves performance on a variety of challenging tasks, even getting closer to the results of expensive, commercial systems.

Abstract

Recent advances in multimodal large reasoning models (MLRMs) have substantially improved their ability to solve complex textual and visual tasks. However, these models tend to overthink on simple problems, producing unnecessarily lengthy reasoning traces, while under-exploring on challenging ones, leading to missed solutions. To address this imbalance, we propose ARES, a unified open-source framework for adaptive reasoning that dynamically allocates exploration effort based on task difficulty. Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, high window-entropy (HWE) tokens (token-level entropies averaged under a sliding window) can reliably capture reasoning-critical moments; and (ii) reducing HWE usage benefits easy problems, while increasing it is essential for solving hard ones. Building on these insights, ARES introduces a two-stage training pipeline. In the Adaptive Cold-Start stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness. In the second stage, we develop Adaptive Entropy Policy Optimization (AEPO), which uses HWE tokens as exploration triggers to decide when to explore, and a hierarchical entropy reward with dynamic KL control to decide how much to explore. Extensive experiments demonstrate that ARES achieves superior performance and reasoning efficiency across diverse mathematical, logical, and multimodal benchmarks, while closing the gap to leading commercial systems under significantly lower inference costs.

View Paper