Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the suitability of generative AI in the execution of literature retrieval within literature reviews
0
Zitationen
4
Autoren
2025
Jahr
Abstract
ChatGPT is a natural language processing tool that creates human-like conversations, responds to questions, and creates written content when prompted by the end-user. As ChatGPT is trained on published material, allowing it to find and parse relevant literature about prompted targets, it in theory is an ideal way to make literature reviews more efficient. As more academics use the tool, gauging the accuracy of the information gathered by this automation becomes important. Our research aims to assess the accuracy of literature found by ChatGPT when performing a systematic review. We searched PubMed for recent systematic reviews on chronic diseases (e.g., diabetes and hypertension) published before November 2022. Two researchers extracted aims and inclusion/exclusion criteria from each review. Using these criteria, we prompted ChatGPT to find 10 relevant articles. Researchers cross-referenced ChatGPT’s results with Google Scholar, PubMed, and Tulane Library’s database, as well as the original review’s included articles. We categorized ChatGPT’s results as fake, real but not in the review, or matched with the review. If ChatGPT provided ten real articles, we prompted it for another set. We calculated the rates of each outcome. Nine systematic reviews were selected to assess ChatGPT’s ability to conduct literature reviews. In total, ChatGPT found 90 articles after 9 sets of 10 articles each of 90 articles, 58% of articles were real but 38 (42%) of citations were for articles that did not exist. Additionally, of the 90 articles, only 16 (18%) matched articles in the systematic reviews. 38 (42%) were fake, 16 (18%) were real articles that matched the target review, and 36 (40%) were real articles but did not match the reviews. Furthermore, we never achieved 10/10 real articles in a single query. ChatGPT is a tool that can demonstrably make healthcare research tasks more efficient. However, healthcare decision and policy makers cannot yet rely on pure generative AI output without knowing if humans were involved in the entire research process. And so, there appears to exist an ability ceiling above which the current ChatGPT algorithms cannot reach. • ChatGPT is a tool that can demonstrably make research tasks more efficient. • This study found that 42% of ChatGPT citations were fake • Healthcare decision and policy makers cannot yet rely on pure generative AI output
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.