DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering

Yinsheng Li, Zhen Dong, Yi Shao

2025-07-17

DrafterBench: Benchmarking Large Language Models for Tasks Automation in
Civil Engineering

Summary

This paper talks about DrafterBench, a new benchmark created to test how well large language models can help automate tasks in civil engineering, especially for revising technical drawings.

What's the problem?

The problem is that civil engineering involves complex technical drawings and data that require careful understanding, editing, and reasoning, which is challenging for AI models to do accurately and efficiently.

What's the solution?

The authors designed DrafterBench with tasks that check if AI agents can understand structured drawing data, follow detailed instructions, execute functions, and critically reason about changes needed in drawings. They used this benchmark to evaluate different language models and see how well they perform in these tasks.

Why it matters?

This matters because automating technical drawing revision can save time and reduce errors in civil engineering projects, making construction and design processes faster, safer, and more cost-effective through the help of AI.

Abstract

DrafterBench is an open-source benchmark for evaluating LLM agents in technical drawing revision, assessing their capabilities in structured data comprehension, function execution, instruction following, and critical reasoning.

View Paper