Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Various AI Tools Versus ENT Experts in the Diagnosis of ENT diseases: A Cross-Sectional Comparative Study
0
Zitationen
3
Autoren
2026
Jahr
Abstract
<title>Abstract</title> Purpose Numerous Large Language Models (LLMs) have demonstrated exceptional capabilities in processing and comprehending natural language data. Assessing the applicability of these LLMs in healthcare settings is of paramount importance. This study aims to evaluate the accuracy of various LLMs in diagnosing Ear, Nose, and Throat (ENT) pathology and to compare their performance to that of ENT experts. Methods We conducted a cross-sectional comparative study where 32 real ENT cases were presented to ChatGPT-4, ChatGPT-5.2, Microsoft Copilot, ENT physicians and residents. Each participant or LLM provided three differential diagnoses. The study analyzed diagnostic accuracy rates and inter-rater agreement between participant groups and the LLMs. Results ChatGPT-5.2 achieved the highest accuracy rate (90.1%), which was significantly higher than that of all other LLMs and both ENT physicians and residents. ChatGPT-4 demonstrated an accuracy of 71.9%, which did not differ significantly from ENT physicians but was significantly higher than that of ENT residents. In contrast, Microsoft Copilot showed a significantly lower correctness rate compared with both ENT physicians (25% vs. 75.9%, <italic>p</italic> < 0.001) and ENT residents (25% vs. 70.3%, <italic>p</italic> < 0.001). Inter-rater agreement between ENT physicians and each LLM was poor. Regarding identification of the most critical diagnosis, ChatGPT-4, ChatGPT-5.2, and Microsoft Copilot mentioned it in 62.5%, 80.2%, and 84.4% of cases, respectively. Conclusion ChatGPT-4 demonstrated diagnostic accuracy comparable to that of ENT physicians, whereas ChatGPT-5.2 achieved substantially higher performance. In contrast, Microsoft Copilot exhibited significantly lower overall diagnostic accuracy compared with ENT experts (physicians and residents). However, with respect to identification of the most critical diagnoses, Microsoft Copilot outperformed both ChatGPT models.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.303 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.155 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.555 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.453 Zit.