Discovering and using Spelke segments
Rahul Venkatesh, Klemen Kotar, Lilian Naing Chen, Seungwoo Kim, Luca Thomas Wheeler, Jared Watrous, Ashley Xu, Gia Ancone, Wanhee Lee, Honglin Chen, Daniel Bear, Stefan Stojanov, Daniel Yamins
2025-07-25
Summary
This paper talks about SpelkeNet, a visual world model that learns to identify objects based on how they move together, rather than just what they look like.
What's the problem?
Most image segmentation methods rely on categories like 'car' or 'tree' to separate objects, but this doesn't always help with understanding how objects physically interact or move, which is important for tasks like grabbing or moving things.
What's the solution?
The researchers built SpelkeNet to predict how parts of an image will move if poked or pushed and use this information to group pixels into meaningful objects called Spelke segments, which better reflect real-world physical boundaries.
Why it matters?
This matters because SpelkeNet outperforms existing segmentation methods and helps improve robot tasks that involve manipulating objects, making AI better at understanding and interacting with the physical world.
Abstract
SpelkeNet, a visual world model, outperforms supervised baselines in identifying Spelke objects and enhances performance in physical object manipulation tasks.