EmbRACE-3K: Embodied Reasoning and Action in Complex Environments
Mingxian Lin, Wei Huang, Yitang Li, Chengjie Jiang, Kui Wu, Fangwei Zhong, Shengju Qian, Xin Wang, Xiaojuan Qi
2025-07-15
Summary
This paper talks about EmRACE-3K, a new dataset created to test and improve how AI models that understand both vision and language can reason and act in complicated environments by following language instructions.
What's the problem?
AI models often struggle to understand and perform tasks that need both seeing the surroundings and reasoning through language instructions, especially in complex and varied situations.
What's the solution?
The researchers developed EmRACE-3K, which includes thousands of tasks where AI has to use language to guide actions in different environments. They tested models on this dataset both without any prior training (zero-shot) and after fine-tuning to show how the dataset helps improve embodied reasoning.
Why it matters?
This matters because improving embodied reasoning in AI will help create smarter robots and systems that can understand instructions and interact effectively in the real world.
Abstract
EmRACE-3K, a dataset of language-guided tasks in diverse environments, evaluates and improves embodied reasoning capabilities of vision-language models through zero-shot and fine-tuned performance.