Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI Performance on Image-based Medical Case Scenarios: A Cross-Sectional Comparative Study
0
Zitationen
6
Autoren
2025
Jahr
Abstract
<title>Abstract</title> Background Large language models (LLMs) have shown remarkable progress in text-based tasks, but their ability to interpret and respond to image-based clinical scenarios remains underexplored. This study evaluated and compared the performance of ChatGPT-5 and Claude in answering subjective image-based medical case questions. Methods A cross-sectional comparative study was conducted using 71 dermatological case scenarios subjective questions designed by the research team. Each AI system generated responses to identical visual and textual inputs without external assistance. Two experienced dermatologists, blinded to model identity, independently scored the responses against standard answers. Inter-rater reliability was assessed using intraclass correlation coefficients (ICC), and comparative analyses employed Mann–Whitney U tests, Bland–Altman plots, and correlation metrics. Results Both evaluators demonstrated excellent inter-rater reliability (ICC > 0.86). Claude achieved higher mean scores (27.39 ± 11.44) than ChatGPT-5 (25.53 ± 11.45; p < 0.001). Claude also showed stronger correlation with reference standards (ρ = 0.88 vs. 0.83), lower mean absolute error (14.76% vs. 19.98%), and reduced root mean square error (7.24 vs. 9.24). Bland–Altman analysis revealed minimal systematic bias between evaluators, indicating consistent scoring reliability. Conclusions Both multimodal LLMs demonstrated strong competence in interpreting image-based medical scenarios. Claude exhibited a modest but consistent advantage in diagnostic reasoning and clinical alignment. These findings support the potential of LLMs as supplementary educational tools in visual disciplines such as dermatology, emphasizing the importance of model selection, supervised use, and continued evaluation as AI integration in medical education expands.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.100 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.466 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.