Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the Reliability of <scp>GPT</scp> ‐4o in Histological Image Interpretation
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Advanced large language models with multimodal capabilities offer potential new applications in medical education. This study evaluated GPT-4o's performance in normal histology image interpretation. We assessed GPT-4o's ability to interpret 120 histological images across four histological tissue types at three different magnification levels. Three histology experts evaluated responses using a 4-point rubric across three assessment criteria: tissue/organ identification, structure identification, and structure function assessment. Statistical analysis included ANOVA with Tukey tests, three-way ANOVA for interaction effects, Pearson's correlation, and ICC for reliability. GPT-4o achieved an overall mean score of 2.71 (SE 0.07), with 59.01% of responses rated "Good" or "Excellent." Performance varied significantly across tissues, with epithelial showing highest accuracy (mean 3.11, SE 0.06) and muscle lowest (mean 2.43, SE 0.07). Combined 3 magnifications yielded better results (mean 3.03, SE 0.07) than low magnification alone (mean 2.41, SE 0.07, p < 0.001). Tissue/organ identification questions received higher scores (mean 2.83) than structure identification (mean 2.65) and structure function assessment (mean 2.64). Inter-rater reliability was excellent (ICC = 0.89). GPT-4o demonstrates moderate histological interpretation ability, varying by tissue type and magnification level. The model performs best with multiple magnification views. These findings suggest potential use in medical education but indicate the need for instructors' supervision.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.