< Explain other AI papers

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

Jonas Belouadi, Eddy Ilg, Margret Keuper, Hideki Tanaka, Masao Utiyama, Raj Dabre, Steffen Eger, Simone Paolo Ponzetto

2025-03-21

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

Summary

This paper is about teaching computers to create images from text descriptions, even when they haven't been specifically trained on examples of those images and descriptions together.

What's the problem?

It's hard to train computers to create images from text because it requires a lot of examples of images and their corresponding text descriptions, which are hard to get.

What's the solution?

The researchers developed a system that uses image representations as a middle step, allowing the computer to learn from separate sets of images and text descriptions and then combine that knowledge to create new images.

Why it matters?

This work matters because it could make it easier to create images for various purposes, even when there isn't much training data available.

Abstract

With the rise of generative AI, synthesizing figures from text captions becomes a compelling application. However, achieving high geometric precision and editability requires representing figures as graphics programs in languages like TikZ, and aligned training data (i.e., graphics programs with captions) remains scarce. Meanwhile, large amounts of unaligned graphics programs and captioned raster images are more readily available. We reconcile these disparate data sources by presenting TikZero, which decouples graphics program generation from text understanding by using image representations as an intermediary bridge. It enables independent training on graphics programs and captioned images and allows for zero-shot text-guided graphics program synthesis during inference. We show that our method substantially outperforms baselines that can only operate with caption-aligned graphics programs. Furthermore, when leveraging caption-aligned graphics programs as a complementary training signal, TikZero matches or exceeds the performance of much larger models, including commercial systems like GPT-4o. Our code, datasets, and select models are publicly available.