Improving Large Vision and Language Models by Learning from a Panel of Peers
Jefferson Hernandez, Jing Shi, Simon Jenni, Vicente Ordonez, Kushal Kafle
2025-09-03
Summary
This paper introduces a new way to train large AI models that can understand both images and language, without needing a ton of feedback from people.
What's the problem?
Training these AI models to give helpful and accurate responses is hard. Usually, it requires people to rank different answers, which is expensive and time-consuming. Trying to have the AI create its own training data doesn't work well because it often makes things up, or 'hallucinates'. Existing methods just aren't efficient or reliable enough.
What's the solution?
The researchers created a system where multiple AI models act as 'peers' reviewing each other's work. They give the models a set of questions, and each model generates an answer. Then, the other models evaluate those answers, and everyone learns from the feedback. It's like a study group where students improve by critiquing and learning from each other. This process repeats, leading to better and better responses without constant human input.
Why it matters?
This is important because it offers a way to improve these powerful AI models much more cheaply and efficiently. Instead of relying on expensive human feedback, we can leverage the models themselves to learn and improve, making advanced AI more accessible and scalable. The results show a noticeable improvement in the AI's performance on a variety of tasks, suggesting this 'peer review' approach is a promising alternative to current training methods.
Abstract
Traditional alignment methods for Large Vision and Language Models (LVLMs) primarily rely on human-curated preference data. Human-generated preference data is costly; machine-generated preference data is limited in quality; and self-supervised preference data often introduces hallucinations. To overcome these limitations, we propose a novel Panel-of-Peers learning framework inspired by collaborative learning among humans. This approach leverages a panel of LVLMs, each evaluating and learning from their collective outputs through an iterative self-improvement process. By simulating a peer review system, our models generate, assess, and refine outputs in response to a curated set of prompts, mimicking a classroom learning environment. We demonstrate that this methodology enhances model performance without requiring extensive human-labeled datasets. Our experiments show significant improvement across multiple benchmarks, demonstrating the potential of peer evaluations as a scalable alternative to self-supervised alignment. Notably, we show that Panel-of-Peers increases the average score on fifteen benchmarks from 48% to 57%