Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of artificial intelligence large language models (Copilot and Gemini) compared to human experts in healthcare policy making: A mixed-methods cross-sectional study
0
Zitationen
6
Autoren
2025
Jahr
Abstract
ObjectiveThis study aimed to assess the performance of Artificial Intelligence (AI) compared to human experts in healthcare policymaking.MethodsThis was a mixed-methods cross-sectional study conducted in Iran during the years 2024-2025, comparing, and analyzing the responses of multiple AI Large Language Models (LLMs) including Bing AI Copilot and Gemini and a sample of 15 human experts-using confusion matrix analysis. This analysis provided comprehensive data on the respondents' ability to answer context-specific questions regarding healthcare policy making, evaluated through multiple parameters including sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and overall accuracy.ResultsCopilot demonstrated a sensitivity of 0.867, specificity of 0, PPV of 0.722, NPV of 0, and accuracy of 0.65. In comparison, Gemini exhibited a sensitivity of 0.733, specificity of 0.4, PPV of 0.786, NPV of 0.333, and also an accuracy of 0.65. Additionally, the human experts' responses indicated a sensitivity of 0.5808, specificity of 0.2571, PPV of 0.7189, NPV of 0.1579, and an accuracy of 0.5050.ConclusionThe AI LLMs outperformed human experts in responding to the study questionnaire. The findings demonstrated the considerable potential of the LLMs in enhancing healthcare policy-making, particularly by serving as complementary tools and collaborators alongside humans.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.287 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.140 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.534 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.450 Zit.