Agentic Design of Compositional Machines

Wenqian Zhang, Weiyang Liu, Zhen Liu

2025-10-17

Agentic Design of Compositional Machines

Summary

This paper investigates whether large language models, the kind powering many AI chatbots, can actually *create* things, specifically complex machines. They test this by having the AI design machines in a simulated environment using pre-made parts, focusing on machines that can move or manipulate objects.

What's the problem?

Designing machines is something humans are good at, showing intelligence and engineering skill. The question is, can AI do it too? Current AI models struggle with tasks that require understanding how things work in the physical world, like building something that actually moves as intended. They lack the ability to think spatially, plan a build step-by-step, and follow instructions accurately when it comes to physical construction.

What's the solution?

The researchers created a special testing environment called BesiegeField, based on the game Besiege, where AI can build machines and see if they work. They then tested existing AI models on this environment, identifying where they failed. Because the initial results weren't great, they tried improving the AI by using a technique called reinforcement learning, essentially letting the AI learn through trial and error with a dataset specifically designed for this task.

Why it matters?

This research is important because it pushes the boundaries of what AI can do. If AI can learn to design machines, it could lead to automated engineering, faster prototyping, and new discoveries in robotics and other fields. It also highlights the specific skills AI needs to develop – like understanding physics and spatial relationships – to become truly creative and capable problem-solvers.

Abstract

The design of complex machines stands as both a marker of human intelligence and a foundation of engineering practice. Given recent advances in large language models (LLMs), we ask whether they, too, can learn to create. We approach this question through the lens of compositional machine design: a task in which machines are assembled from standardized components to meet functional demands like locomotion or manipulation in a simulated physical environment. To support this investigation, we introduce BesiegeField, a testbed built on the machine-building game Besiege, which enables part-based construction, physical simulation and reward-driven evaluation. Using BesiegeField, we benchmark state-of-the-art LLMs with agentic workflows and identify key capabilities required for success, including spatial reasoning, strategic assembly, and instruction-following. As current open-source models fall short, we explore reinforcement learning (RL) as a path to improvement: we curate a cold-start dataset, conduct RL finetuning experiments, and highlight open challenges at the intersection of language, machine design, and physical reasoning.

View Paper