OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 25.03.2026, 19:15

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Various AI Tools Versus ENT Experts in the Diagnosis of ENT diseases: A Cross-Sectional Comparative Study

2026·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

<title>Abstract</title> Purpose Numerous Large Language Models (LLMs) have demonstrated exceptional capabilities in processing and comprehending natural language data. Assessing the applicability of these LLMs in healthcare settings is of paramount importance. This study aims to evaluate the accuracy of various LLMs in diagnosing Ear, Nose, and Throat (ENT) pathology and to compare their performance to that of ENT experts. Methods We conducted a cross-sectional comparative study where 32 real ENT cases were presented to ChatGPT-4, ChatGPT-5.2, Microsoft Copilot, ENT physicians and residents. Each participant or LLM provided three differential diagnoses. The study analyzed diagnostic accuracy rates and inter-rater agreement between participant groups and the LLMs. Results ChatGPT-5.2 achieved the highest accuracy rate (90.1%), which was significantly higher than that of all other LLMs and both ENT physicians and residents. ChatGPT-4 demonstrated an accuracy of 71.9%, which did not differ significantly from ENT physicians but was significantly higher than that of ENT residents. In contrast, Microsoft Copilot showed a significantly lower correctness rate compared with both ENT physicians (25% vs. 75.9%, <italic>p</italic> &lt; 0.001) and ENT residents (25% vs. 70.3%, <italic>p</italic> &lt; 0.001). Inter-rater agreement between ENT physicians and each LLM was poor. Regarding identification of the most critical diagnosis, ChatGPT-4, ChatGPT-5.2, and Microsoft Copilot mentioned it in 62.5%, 80.2%, and 84.4% of cases, respectively. Conclusion ChatGPT-4 demonstrated diagnostic accuracy comparable to that of ENT physicians, whereas ChatGPT-5.2 achieved substantially higher performance. In contrast, Microsoft Copilot exhibited significantly lower overall diagnostic accuracy compared with ENT experts (physicians and residents). However, with respect to identification of the most critical diagnoses, Microsoft Copilot outperformed both ChatGPT models.

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AIVoice and Speech Disorders
Volltext beim Verlag öffnen