Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

Shashanka Venkataramanan, Valentinos Pariza, Mohammadreza Salehi, Lukas Knobel, Spyros Gidaris, Elias Ramzi, Andrei Bursuc, Yuki M. Asano

2025-07-21

Franca: Nested Matryoshka Clustering for Scalable Visual Representation
Learning

Summary

This paper talks about Franca, a new open-source vision foundation model that uses a special clustering technique called nested Matryoshka clustering to learn visual features effectively while keeping the model size small.

What's the problem?

The problem is that many current vision models have difficulty handling the ambiguity in grouping image features and often include positional biases that make it harder for the model to understand the real meaning of images.

What's the solution?

The authors proposed a transparent training pipeline combining nested multi-head clustering that refines features from coarse to fine levels and a positional disentanglement method that removes biases related to exact positions in images, which leads to clearer and more meaningful visual representations.

Why it matters?

This matters because Franca matches or even outperforms top proprietary vision models while being fully open-source, enabling more people to access powerful vision AI tools trained on public data, which helps advance research and applications in image recognition and understanding.

Abstract

Franca, an open-source vision foundation model, achieves high performance using a transparent training pipeline and novel clustering and disentanglement techniques.

View Paper