FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI

Yuhang Peng, Yizhou Pan, Xinning He, Jihaoyu Yang, Xinyu Yin, Han Wang, Xiaoji Zheng, Chao Gao, Jiangtao Gong

2025-11-20

FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI

Summary

This paper introduces a new simulation environment called FreeAskWorld designed to help AI systems learn how to interact with the world and people more realistically, going beyond just moving around physically.

What's the problem?

Current AI simulations are good at basic movement and physical tasks, but they struggle with the complex social interactions humans have every day, like asking for directions or understanding what someone *means* rather than just what they *say*. Existing benchmarks don't really test an AI's ability to actively seek information and use it to complete tasks in a human-like way.

What's the solution?

The researchers created FreeAskWorld, a virtual world where AI agents can use large language models (the same tech behind chatbots) to plan actions and have meaningful conversations. They built a large dataset within this world, including lots of different scenarios and interactions, specifically focusing on a task where agents need to ask for and understand directions. They then tested existing AI models in this new environment and improved them by training them on the data from FreeAskWorld.

Why it matters?

This work is important because it shows that teaching AI to interact socially – to ask questions and understand responses – is a key step towards creating truly intelligent systems. It also highlights that the act of interacting itself provides valuable information for the AI to learn from, making it better at understanding the world and completing tasks. This could lead to more natural and helpful AI assistants and robots in the future.

Abstract

As embodied intelligence emerges as a core frontier in artificial intelligence research, simulation platforms must evolve beyond low-level physical interactions to capture complex, human-centered social behaviors. We introduce FreeAskWorld, an interactive simulation framework that integrates large language models (LLMs) for high-level behavior planning and semantically grounded interaction, informed by theories of intention and social cognition. Our framework supports scalable, realistic human-agent simulations and includes a modular data generation pipeline tailored for diverse embodied tasks.To validate the framework, we extend the classic Vision-and-Language Navigation (VLN) task into a interaction enriched Direction Inquiry setting, wherein agents can actively seek and interpret navigational guidance. We present and publicly release FreeAskWorld, a large-scale benchmark dataset comprising reconstructed environments, six diverse task types, 16 core object categories, 63,429 annotated sample frames, and more than 17 hours of interaction data to support training and evaluation of embodied AI systems. We benchmark VLN models, and human participants under both open-loop and closed-loop settings. Experimental results demonstrate that models fine-tuned on FreeAskWorld outperform their original counterparts, achieving enhanced semantic understanding and interaction competency. These findings underscore the efficacy of socially grounded simulation frameworks in advancing embodied AI systems toward sophisticated high-level planning and more naturalistic human-agent interaction. Importantly, our work underscores that interaction itself serves as an additional information modality.

View Paper