Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

Boyang Deng, Songyou Peng, Kyle Genova, Gordon Wetzstein, Noah Snavely, Leonidas Guibas, Thomas Funkhouser

2025-04-14

Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections
of Images

Summary

This paper talks about Visual Chronicles, a new system that uses advanced AI models capable of understanding both images and text to study huge collections of pictures. The goal is to find patterns and trends over time, like how styles or events change, without needing any labels or categories set up beforehand.

What's the problem?

The problem is that it's really hard to analyze massive image collections for trends or changes over time because most systems need you to label or organize the images first. Labeling takes a lot of time and effort, and it can limit what you find since you're only looking for things you already know about.

What's the solution?

The researchers created Visual Chronicles, which uses multimodal large language models (LLMs) to automatically look through lots of images and spot interesting patterns or changes as time goes by. The system doesn't need any labels or guidance about what to look for—it can discover trends on its own, which makes it much more powerful and flexible than older methods.

Why it matters?

This work matters because it makes it possible to study and understand huge image collections, like social media photos or historical archives, in a way that's faster and more open-minded. Visual Chronicles can help researchers, historians, and anyone curious about visual trends to find new insights and stories that would be almost impossible to discover by hand.

Abstract

A system using Multimodal LLMs analyzes large image datasets to discover open-ended temporal patterns and trends without predefined labels, achieving superior performance compared to baselines.

View Paper