A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

Haopeng Zhang, Philip S. Yu, Jiawei Zhang

2024-06-21

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

Summary

This paper provides a comprehensive overview of the evolution of text summarization techniques, focusing on the shift from traditional statistical methods to modern approaches using large language models (LLMs).

What's the problem?

Text summarization has changed significantly over the years, but there is a need for a clear understanding of how these methods have developed and what challenges researchers face today. Many existing studies focus on specific techniques without providing a complete picture of the field, making it hard for new researchers to navigate the landscape of summarization research.

What's the solution?

The authors organized their survey into two main parts. The first part reviews older summarization methods, including statistical techniques and deep learning approaches that were used before the rise of LLMs. The second part examines recent advancements in summarization using LLMs, discussing how these models have improved the quality and efficiency of generating summaries. The paper also highlights current research trends, challenges, and future directions for study in this area.

Why it matters?

This research is important because it helps both new and experienced researchers understand the progress made in text summarization and the impact of new technologies like LLMs. By providing a clear overview of past and present methods, the survey can guide future research efforts, ultimately leading to better tools for automatically summarizing information, which is increasingly necessary in our data-driven world.

Abstract

Text summarization research has undergone several significant transformations with the advent of deep neural networks, pre-trained language models (PLMs), and recent large language models (LLMs). This survey thus provides a comprehensive review of the research progress and evolution in text summarization through the lens of these paradigm shifts. It is organized into two main parts: (1) a detailed overview of datasets, evaluation metrics, and summarization methods before the LLM era, encompassing traditional statistical methods, deep learning approaches, and PLM fine-tuning techniques, and (2) the first detailed examination of recent advancements in benchmarking, modeling, and evaluating summarization in the LLM era. By synthesizing existing literature and presenting a cohesive overview, this survey also discusses research trends, open challenges, and proposes promising research directions in summarization, aiming to guide researchers through the evolving landscape of summarization research.

View Paper