OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su, Linjie Li, Mingyang Song, Yunzhuo Hao, Zhengyuan Yang, Jun Zhang, Guanjie Chen, Jiawei Gu, Juntao Li, Xiaoye Qu, Yu Cheng
2025-05-16
Summary
This paper talks about OpenThinkIMG, a new system that teaches AI models to use special visual tools to help them think and solve problems when looking at images, especially charts and graphs.
What's the problem?
The problem is that even advanced AI models struggle to understand and reason through complex images like charts, because they can't easily use external tools to help them interpret the data and make decisions.
What's the solution?
The researchers developed OpenThinkIMG and a technique called V-ToolRL, which trains the AI to figure out when and how to use different vision tools while analyzing images. This makes the model much better at solving tasks that involve understanding charts and visual data, beating previous methods.
Why it matters?
This matters because it helps AI become more useful for tasks like analyzing scientific data, business reports, or any situation where understanding visual information is important, making these systems smarter and more practical for real-world use.
Abstract
OpenThinkIMG and V-ToolRL enable LVLMs to learn adaptive policies for using external vision tools, outperforming existing methods on chart reasoning tasks.