From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Hongrui Jia, Chaoya Jiang, Shikun Zhang, Wei Ye

2026-02-27

From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Summary

This paper introduces a new way to train large AI models that can understand both text and images, called Diagnostic-driven Progressive Evolution (DPE). It's about making these models better at reasoning and making decisions by continuously testing and improving them.

What's the problem?

Currently, training these powerful AI models relies on a fixed set of data and instructions. This makes it hard to figure out *why* a model is failing at certain tasks, or to quickly focus on improving its weaknesses. It's like trying to fix a car without knowing what's broken, or just repeatedly practicing something you're already good at instead of focusing on areas where you struggle.

What's the solution?

The researchers developed DPE, which works like a cycle of testing, diagnosing, and then retraining. First, multiple AI 'agents' create a large amount of new training data, using tools like web searches and image editing to make it realistic. Then, when the model makes a mistake, the system tries to pinpoint *exactly* what the model didn't understand. Based on this diagnosis, the agents create more training data specifically designed to address those weaknesses, and the model is retrained. This process repeats, continually improving the model's abilities.

Why it matters?

This research is important because it provides a scalable method for continually improving these large AI models without needing to manually adjust the training process. It allows the models to learn and adapt to new challenges more effectively, making them more reliable and useful in a wider range of real-world applications. It's a step towards AI that can learn and improve on its own, similar to how humans learn from their mistakes.

Abstract

As Large Multimodal Models (LMMs) scale up and reinforcement learning (RL) methods mature, LMMs have made notable progress in complex reasoning and decision making. Yet training still relies on static data and fixed recipes, making it difficult to diagnose capability blind spots or provide dynamic, targeted reinforcement. Motivated by findings that test driven error exposure and feedback based correction outperform repetitive practice, we propose Diagnostic-driven Progressive Evolution (DPE), a spiral loop where diagnosis steers data generation and reinforcement, and each iteration re-diagnoses the updated model to drive the next round of targeted improvement. DPE has two key components. First, multiple agents annotate and quality control massive unlabeled multimodal data, using tools such as web search and image editing to produce diverse, realistic samples. Second, DPE attributes failures to specific weaknesses, dynamically adjusts the data mixture, and guides agents to generate weakness focused data for targeted reinforcement. Experiments on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct show stable, continual gains across eleven benchmarks, indicating DPE as a scalable paradigm for continual LMM training under open task distributions. Our code, models, and data are publicly available at https://github.com/hongruijia/DPE.

View Paper