AnyUp: Universal Feature Upsampling

Thomas Wimmer, Prune Truong, Marie-Julie Rakotosaona, Michael Oechsle, Federico Tombari, Bernt Schiele, Jan Eric Lenssen

2025-10-17

Summary

This paper introduces AnyUp, a new technique for increasing the resolution of image features, making them more detailed.

What's the problem?

Currently, methods that enhance image features usually need to be specifically trained for each type of feature extractor used. This means if you switch the way you initially analyze an image, you have to retrain the upsampling method, which is time-consuming and limits flexibility. They don't work well with different kinds of image features without this retraining.

What's the solution?

AnyUp solves this by creating an upsampling method that works 'out of the box' with any image feature, regardless of how it was originally created. It's designed to work during the image analysis process itself, without needing separate training for each feature type. It essentially boosts the detail of the features without needing to know anything specific about where they came from.

Why it matters?

This is important because it makes it much easier to improve the performance of many computer vision tasks. By providing a general-purpose upsampling method, AnyUp simplifies the process of working with different image features and allows for better results across a wider range of applications, and it does so efficiently.

Abstract

We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an inference-time feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks.

View Paper