Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Synthesizing scientific literature with retrieval-augmented language models
3
Zitationen
28
Autoren
2026
Jahr
Abstract
Scientific progress depends on the ability of researchers to synthesize the growing body of literature. Can large language models (LLMs) assist scientists in this task? Here we introduce OpenScholar, a specialized retrieval-augmented language model (LM)<sup>1</sup> that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience and biomedicine. Despite being a smaller open model, OpenScholar-8B outperforms GPT-4o by 6.1% and PaperQA2 by 5.5% in correctness on a challenging multi-paper synthesis task from the new ScholarQABench. Although GPT-4o hallucinates citations 78-90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar's data store, retriever and self-feedback inference loop improve off-the-shelf LMs: for instance, OpenScholar-GPT-4o improves the correctness of GPT-4o by 12%. In human evaluations, experts preferred OpenScholar-8B and OpenScholar-GPT-4o responses over expert-written ones 51% and 70% of the time, respectively, compared with 32% for GPT-4o. We open-source all artefacts, including our code, models, data store, datasets and a public demo.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.197 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.047 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.410 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.
Autoren
- Akari Asai
- Jacqueline He
- Rulin Shao
- Weijia Shi
- Amanpreet Singh
- Joseph Chee Chang
- Kyle Shih-Huang Lo
- Luca Soldaini
- Sergey Feldman
- Kurt Dauth
- David Wadden
- Yixin Liu
- Jenna Sparks
- Jena D. Hwang
- Varsha Kishore
- Minyang Tian
- Pan Ji
- Shengyan Liu
- Hao Tong
- Bohao Wu
- Yanyu Xiong
- Sonal Gupta
- Graham Neubig
- Daniel S. Weld
- Doug Downey
- Wen-tau Yih
- Pang Wei Koh
- Hannaneh Hajishirzi