Counteracting Matthew Effect in Self-Improvement of LVLMs through Head-Tail Re-balancing
Xin Guo, Zhiheng Xi, Yiwen Ding, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang
2025-10-31
Summary
This paper investigates a problem with how large vision-language models (LVLMs) get better at reasoning. These models learn by trying things out and improving based on successful attempts, but the researchers found this process isn't always effective.
What's the problem?
When these models try to improve themselves, they get really good at solving easy problems, but continue to struggle with harder ones. This creates an imbalance where the model focuses on what it already knows how to do well, instead of pushing itself to learn more complex reasoning. This imbalance gets worse over time, like the saying 'the rich get richer,' which the authors call the 'Matthew effect,' and ultimately stops the model from improving as much as it could.
What's the solution?
To fix this, the researchers developed four techniques. These techniques work in two main ways: first, they adjust the types of problems the model practices to include more difficult ones, and second, they change how the model selects which successful attempts to learn from, giving more weight to those that solved harder problems. These methods help the model balance its learning and improve its overall reasoning ability.
Why it matters?
This research is important because it identifies a key limitation in how we're currently improving large AI models. By addressing this imbalance, the techniques presented can lead to significantly better performance on complex reasoning tasks, making these models more useful and capable in real-world applications.
Abstract
Self-improvement has emerged as a mainstream paradigm for advancing the reasoning capabilities of large vision-language models (LVLMs), where models explore and learn from successful trajectories iteratively. However, we identify a critical issue during this process: the model excels at generating high-quality trajectories for simple queries (i.e., head data) but struggles with more complex ones (i.e., tail data). This leads to an imbalanced optimization that drives the model to prioritize simple reasoning skills, while hindering its ability to tackle more complex reasoning tasks. Over iterations, this imbalance becomes increasingly pronounced--a dynamic we term the "Matthew effect"--which ultimately hinders further model improvement and leads to performance bottlenecks. To counteract this challenge, we introduce four efficient strategies from two perspectives: distribution-reshaping and trajectory-resampling, to achieve head-tail re-balancing during the exploration-and-learning self-improvement process. Extensive experiments on Qwen2-VL-7B-Instruct and InternVL2.5-4B models across visual reasoning tasks demonstrate that our methods consistently improve visual reasoning capabilities, outperforming vanilla self-improvement by 3.86 points on average.