ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Yilei Jiang, Yaozhi Zheng, Yuxuan Wan, Jiaming Han, Qunzhong Wang, Michael R. Lyu, Xiangyu Yue

2025-07-31

ScreenCoder: Advancing Visual-to-Code Generation for Front-End
Automation via Modular Multimodal Agents

Summary

This paper talks about ScreenCoder, a new system that can automatically turn pictures or designs of user interfaces (UIs) into real front-end code like HTML and CSS, which are used to build websites and apps.

What's the problem?

The problem is that converting UI designs into working code is complicated because the system needs to understand all the parts of the interface and how they fit together visually and spatially. Many existing AI models only use text instructions and miss important design details, making them less accurate or flexible.

What's the solution?

ScreenCoder solves this by splitting the task into three clear parts handled by different AI agents: one that looks at the image and identifies all the UI components, another that plans how these components should be arranged following good design rules, and a third that writes the actual code based on that plan. This modular approach is easier to understand, improves accuracy, and allows the system to be trained better with lots of automatically created image-code pairs.

Why it matters?

This matters because it can speed up software development and help more people create beautiful, working websites and applications without needing to write code themselves, making design and programming more accessible.

Abstract

A modular multi-agent framework improves UI-to-code generation by integrating vision-language models, hierarchical layout planning, and adaptive prompt-based synthesis, achieving state-of-the-art performance.

View Paper