OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 08.04.2026, 11:37

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the suitability of generative AI in the execution of literature retrieval within literature reviews

2025·0 Zitationen·Pharmacoeconomics and PolicyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

ChatGPT is a natural language processing tool that creates human-like conversations, responds to questions, and creates written content when prompted by the end-user. As ChatGPT is trained on published material, allowing it to find and parse relevant literature about prompted targets, it in theory is an ideal way to make literature reviews more efficient. As more academics use the tool, gauging the accuracy of the information gathered by this automation becomes important. Our research aims to assess the accuracy of literature found by ChatGPT when performing a systematic review. We searched PubMed for recent systematic reviews on chronic diseases (e.g., diabetes and hypertension) published before November 2022. Two researchers extracted aims and inclusion/exclusion criteria from each review. Using these criteria, we prompted ChatGPT to find 10 relevant articles. Researchers cross-referenced ChatGPT’s results with Google Scholar, PubMed, and Tulane Library’s database, as well as the original review’s included articles. We categorized ChatGPT’s results as fake, real but not in the review, or matched with the review. If ChatGPT provided ten real articles, we prompted it for another set. We calculated the rates of each outcome. Nine systematic reviews were selected to assess ChatGPT’s ability to conduct literature reviews. In total, ChatGPT found 90 articles after 9 sets of 10 articles each of 90 articles, 58% of articles were real but 38 (42%) of citations were for articles that did not exist. Additionally, of the 90 articles, only 16 (18%) matched articles in the systematic reviews. 38 (42%) were fake, 16 (18%) were real articles that matched the target review, and 36 (40%) were real articles but did not match the reviews. Furthermore, we never achieved 10/10 real articles in a single query. ChatGPT is a tool that can demonstrably make healthcare research tasks more efficient. However, healthcare decision and policy makers cannot yet rely on pure generative AI output without knowing if humans were involved in the entire research process. And so, there appears to exist an ability ceiling above which the current ChatGPT algorithms cannot reach. • ChatGPT is a tool that can demonstrably make research tasks more efficient. • This study found that 42% of ChatGPT citations were fake • Healthcare decision and policy makers cannot yet rely on pure generative AI output

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)AI in Service Interactions
Volltext beim Verlag öffnen