PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Jang Hyun Cho, Andrea Madotto, Effrosyni Mavroudi, Triantafyllos Afouras, Tushar Nagarajan, Muhammad Maaz, Yale Song, Tengyu Ma, Shuming Hu, Suyog Jain, Miguel Martin, Huiyu Wang, Hanoona Rasheed, Peize Sun, Po-Yao Huang, Daniel Bolya, Nikhila Ravi, Shashank Jain, Tammy Stark, Shane Moon, Babak Damavandi, Vivian Lee

2025-04-18

PerceptionLM: Open-Access Data and Models for Detailed Visual
Understanding

Summary

This paper talks about PerceptionLM, a new open-source AI model designed to deeply understand images and videos, which was built using a mix of computer-generated and real, human-labeled data. The model and its training data are fully transparent and available to everyone.

What's the problem?

The problem is that most advanced AI models for understanding pictures and videos are either closed off to the public or built using private data and secret techniques. This makes it hard for researchers and developers to study how these models work, improve them, or use them for new projects, especially if they can’t afford expensive proprietary tools.

What's the solution?

The researchers created PerceptionLM by gathering a large and diverse set of both synthetic and real labeled data, then trained the model openly so anyone can see how it was built. They also introduced PLM-VideoBench, a new set of tests to measure how well these kinds of models understand images and videos, making it easier to compare different models fairly.

Why it matters?

This matters because it gives everyone access to powerful tools for visual understanding, encourages more open research, and helps make AI fairer and more trustworthy by letting the community see exactly how these models work and how well they perform.

Abstract

A fully transparent Perception Language Model (PLM) for image and video understanding is developed without relying on proprietary models, using large-scale synthetic and human-labeled data and introducing PLM-VideoBench for evaluation.

View Paper