VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna

2026-03-28

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Summary

This paper introduces VFIG, a new system using artificial intelligence to automatically convert images of graphs and diagrams into editable vector graphics (SVGs). It also presents a large new dataset and a way to accurately measure how well these conversions work.

What's the problem?

Often, the original files used to create diagrams in research papers or presentations are lost, leaving only image files like JPEGs or PNGs. These image files are hard to edit or scale without losing quality. Recreating these diagrams by hand is time-consuming and requires specialized skills, so there's a need for a way to automatically recreate them in a flexible, editable format like SVG.

What's the solution?

The researchers created VFIG, a type of AI model that learns to convert images of diagrams into SVG files. They trained this model using a huge new dataset of over 66,000 diagrams and their corresponding SVG versions. They used a smart training process, starting by teaching the AI to recognize basic shapes and then refining it to create complete, well-organized diagrams. They also developed a new set of tests to evaluate how accurately the AI recreates the original diagrams' structure and details.

Why it matters?

This work is important because it makes it much easier to work with diagrams from scientific papers and other sources. Being able to automatically convert images into editable vector graphics saves time and effort, and allows for easier modification and reuse of visual information. The VFIG model performs very well, rivaling even the most advanced AI systems, and the new dataset and evaluation tools will help further improve this technology.

Abstract

Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent. To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. While this task is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. We address this by introducing VFIG-DATA, a large-scale dataset of 66K high-quality figure-SVG pairs, curated from a diverse mix of real-world paper figures and procedurally generated diagrams. Recognizing that SVGs are composed of recurring primitives and hierarchical local structures, we introduce a coarse-to-fine training curriculum that begins with supervised fine-tuning (SFT) to learn atomic primitives and transitions to reinforcement learning (RL) refinement to optimize global diagram fidelity, layout consistency, and topological edge cases. Finally, we introduce VFIG-BENCH, a comprehensive evaluation suite with novel metrics designed to measure the structural integrity of complex figures. VFIG achieves state-of-the-art performance among open-source models and performs on par with GPT-5.2, achieving a VLM-Judge score of 0.829 on VFIG-BENCH.

View Paper