OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D'arcy, David Wadden, Matt Latzke, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig, Dan Weld, Doug Downey

2024-11-22

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Summary

This paper presents OpenScholar, a new AI tool designed to help researchers synthesize scientific literature by retrieving relevant information from a vast database of open-access papers.

What's the problem?

With the rapid growth of scientific publications, it has become increasingly difficult for researchers to keep up with and synthesize information from numerous studies. Traditional methods of literature review can be time-consuming and often lead to inaccuracies, especially when it comes to citing sources correctly.

What's the solution?

OpenScholar addresses these challenges by using a retrieval-augmented language model (LM) that can search through 45 million open-access papers to find relevant passages and generate responses backed by citations. The authors also created ScholarQABench, a benchmark with nearly 3,000 expert-written queries and long-form answers across various scientific fields. This allows them to evaluate how well OpenScholar performs compared to other models. The results showed that OpenScholar is more accurate in providing citations and synthesizing information than existing models like GPT-4o.

Why it matters?

This research is significant because it offers a powerful tool for scientists and researchers that can streamline the literature review process. By improving the accuracy of citations and the quality of synthesized information, OpenScholar can help researchers save time and enhance the reliability of their work. Additionally, making this tool open-source encourages collaboration and further development in the field of scientific research.

Abstract

Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience, and biomedicine. On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model. While GPT4o hallucinates citations 78 to 90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar's datastore, retriever, and self-feedback inference loop also improves off-the-shelf LMs: for instance, OpenScholar-GPT4o improves GPT-4o's correctness by 12%. In human evaluations, experts preferred OpenScholar-8B and OpenScholar-GPT4o responses over expert-written ones 51% and 70% of the time, respectively, compared to GPT4o's 32%. We open-source all of our code, models, datastore, data and a public demo.

View Paper