Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Specificity, sensitivity and accuracy of generative artificial intelligence chatbots in chest X-ray interpretation
0
Zitationen
2
Autoren
2025
Jahr
Abstract
<bold>Background:</bold> Generative artificial intelligence (AI) proved its potential in providing patients with general medical information but it is important also to evaluate how effective it is as clinical decision support for X-ray interpretation <bold>Objective:</bold> To assess the specificity, sensitivity and accuracy of generative AI chatbots in chest X-ray interpretation. <bold>Methods:</bold> In February 2025 we presented 180 chest X-rays to free generative (AI) chatbots (ChatGPT 3.5, Mistral, Claude) and asked them to perform its radiological evaluation. Using true positive, false positive, true negative and false negative answers we calculated the specificity, sensitivity and accuracy of each chatbot. <bold>Results:</bold> Mistral showed the highest sensitivity (41,3%) compared to ChatGPT (36%) and Claude (35.7%). Mistral also showed an advantage in sensitivity in detecting consolidation (50%), decreased density (46.7%), interstitial (33.3%), and nodules (41.3%). However, ChatGPT was best in detecting atelectasis (70%). The best specificity was shown by ChatGPT (46.7%), and the worst was shown by Mistral (6.7%). Claude specificity was 16.7%. Accuracy of ChatGPT was 37.8%, Mistral – 35.0%, Claude – 34.2% <bold>Conclusion:</bold> Generative AI chatbots showed sensitivity from 35.7% to 41,3%. ChatGPT presented the best sensitivity (70%) in detecting atelectasis. Specificity varied from 6.7% (Mistral) to 46.7% (ChatGPT). The accuracy of the chatbots was poor and did not exceed 38%.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.