UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

Zhen Yang, Wenyi Hong, Mingde Xu, Xinyue Fan, Weihan Wang, Jiele Cheng, Xiaotao Gu, Jie Tang

2025-11-17

UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation

Summary

This paper introduces a new AI model, UI2Code^N, designed to automatically write the code for user interfaces (UIs) – things like buttons, menus, and layouts you see in apps and websites.

What's the problem?

Currently, AI models that try to write UI code aren't very good at understanding both images of the UI *and* the text instructions. They also usually try to create the whole UI in one go, which isn't how developers actually work; real developers build and refine UIs step-by-step with feedback. This means existing models don't reach their full potential.

What's the solution?

The researchers created UI2Code^N, a model that's been trained in stages to be better at understanding both visual UI elements and text prompts. It can not only *generate* UI code from a design, but also *edit* existing code based on feedback and *polish* the code to make it better. Importantly, it's designed to work interactively, meaning you can give it feedback multiple times to get the UI exactly how you want it.

Why it matters?

This work is important because it significantly improves the ability of AI to automate UI development. UI2Code^N performs as well as, or even better than, some of the best AI models currently available (even those that aren't publicly accessible), and it's open-source, meaning anyone can use and build upon it. This could make building apps and websites much faster and easier for developers.

Abstract

User interface (UI) programming is a core yet highly complex part of modern software development. Recent advances in visual language models (VLMs) highlight the potential of automatic UI coding, but current approaches face two key limitations: multimodal coding capabilities remain underdeveloped, and single-turn paradigms make little use of iterative visual feedback. We address these challenges with an interactive UI-to-code paradigm that better reflects real-world workflows and raises the upper bound of achievable performance. Under this paradigm, we present UI2Code^N, a visual language model trained through staged pretraining, fine-tuning, and reinforcement learning to achieve foundational improvements in multimodal coding. The model unifies three key capabilities: UI-to-code generation, UI editing, and UI polishing. We further explore test-time scaling for interactive generation, enabling systematic use of multi-turn feedback. Experiments on UI-to-code and UI polishing benchmarks show that UI2Code^N establishes a new state of the art among open-source models and achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5. Our code and models are available at https://github.com/zai-org/UI2Code_N.

View Paper