SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Bo Zhang, Lei Bai
2025-03-11
Summary
This paper talks about SurveyForge, an AI tool that writes academic summary papers by learning from human examples and checking its work against real standards to make them as good as human-written ones.
What's the problem?
AI-generated summary papers often have messy structures and wrong references, making them less reliable than those written by humans.
What's the solution?
SurveyForge first studies how humans organize papers, then writes drafts using trusted sources and refines them, while a special test (SurveyBench) checks quality in three key areas to ensure accuracy.
Why it matters?
This helps researchers save time writing summaries of existing studies while keeping them organized and trustworthy, making it easier to learn about new topics quickly.
Abstract
Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.