OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 22.03.2026, 08:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of artificial intelligence large language models (Copilot and Gemini) compared to human experts in healthcare policy making: A mixed-methods cross-sectional study

2025·0 Zitationen·Health Informatics JournalOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

ObjectiveThis study aimed to assess the performance of Artificial Intelligence (AI) compared to human experts in healthcare policymaking.MethodsThis was a mixed-methods cross-sectional study conducted in Iran during the years 2024-2025, comparing, and analyzing the responses of multiple AI Large Language Models (LLMs) including Bing AI Copilot and Gemini and a sample of 15 human experts-using confusion matrix analysis. This analysis provided comprehensive data on the respondents' ability to answer context-specific questions regarding healthcare policy making, evaluated through multiple parameters including sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and overall accuracy.ResultsCopilot demonstrated a sensitivity of 0.867, specificity of 0, PPV of 0.722, NPV of 0, and accuracy of 0.65. In comparison, Gemini exhibited a sensitivity of 0.733, specificity of 0.4, PPV of 0.786, NPV of 0.333, and also an accuracy of 0.65. Additionally, the human experts' responses indicated a sensitivity of 0.5808, specificity of 0.2571, PPV of 0.7189, NPV of 0.1579, and an accuracy of 0.5050.ConclusionThe AI LLMs outperformed human experts in responding to the study questionnaire. The findings demonstrated the considerable potential of the LLMs in enhancing healthcare policy-making, particularly by serving as complementary tools and collaborators alongside humans.

Ähnliche Arbeiten