VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

Yunxin Li, Xinyu Chen, Zitao Li, Zhenyu Liu, Longyue Wang, Wenhan Luo, Baotian Hu, Min Zhang

2025-05-28

VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied
Iterative Policy Optimization

Summary

This paper talks about a new technique called VerIPO that helps AI models get better at thinking through long and complicated problems when they watch and analyze videos.

What's the problem?

The problem is that current AI models that work with videos often struggle to keep their reasoning clear and accurate when they have to deal with lots of information over a long period of time, making it hard for them to answer questions or explain what's happening in a video.

What's the solution?

The researchers created a special process where a 'verifier' checks the AI's reasoning step by step, helping the model improve its answers as it goes through different training phases. This makes the AI learn faster and become better at handling long and complex reasoning tasks in videos.

Why it matters?

This matters because it means AI can become much more reliable at understanding and explaining videos, which is important for things like education, security, and entertainment, where understanding long video content is often necessary.

Abstract

A Verifier-guided Iterative Policy Optimization method enhances Video-LLMs' reasoning capabilities by integrating a Rollout-Aware Verifier between GRPO and DPO phases, leading to faster and more effective optimization.

View Paper