Edge Weight Prediction For Category-Agnostic Pose Estimation

Or Hirschorn, Shai Avidan

2024-11-26

Edge Weight Prediction For Category-Agnostic Pose Estimation

Summary

This paper introduces EdgeCape, a new framework for Category-Agnostic Pose Estimation (CAPE) that improves how models can identify keypoints on various objects using a graph-based approach.

What's the problem?

Current methods for pose estimation often rely on fixed graphs with equal weights, which can limit their effectiveness in accurately locating keypoints on different objects. This can lead to problems when objects are partially hidden or have similar shapes, making it hard for models to distinguish between them.

What's the solution?

EdgeCape addresses this issue by predicting the weights of the edges in the pose graph, allowing for more flexible and accurate localization of keypoints. It uses a technique called Markovian Structural Bias to enhance the model's understanding of how keypoints relate to each other based on their positions. This means that the model can better capture complex relationships within the data. The authors tested EdgeCape on a large dataset and found that it significantly improved accuracy compared to previous methods.

Why it matters?

This research is important because it enhances the ability of AI models to understand and analyze various objects in real-world scenarios. By improving how models estimate poses across different categories, EdgeCape can help in applications like robotics, augmented reality, and computer vision, making these technologies more effective and versatile.

Abstract

Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse object categories with a single model, using one or a few annotated support images. Recent works have shown that using a pose graph (i.e., treating keypoints as nodes in a graph rather than isolated points) helps handle occlusions and break symmetry. However, these methods assume a static pose graph with equal-weight edges, leading to suboptimal results. We introduce EdgeCape, a novel framework that overcomes these limitations by predicting the graph's edge weights which optimizes localization. To further leverage structural priors, we propose integrating Markovian Structural Bias, which modulates the self-attention interaction between nodes based on the number of hops between them. We show that this improves the model's ability to capture global spatial dependencies. Evaluated on the MP-100 benchmark, which includes 100 categories and over 20K images, EdgeCape achieves state-of-the-art results in the 1-shot setting and leads among similar-sized methods in the 5-shot setting, significantly improving keypoint localization accuracy. Our code is publicly available.

View Paper