Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Multimodal Large Language Models for Clinical Diagnosis of Oral Lesions: A Biomedical Informatics Perspective
1
Zitationen
4
Autoren
2025
Jahr
Abstract
Accurate diagnosis of oral lesions remains challenging due to overlapping clinical features and the high risk of missing malignancies. Multimodal large language models (LLMs), such as ChatGPT-4 and Google's Gemini Pro 2.5, can assist clinicians by integrating textual and visual data. This study compared these models with human experts in diagnosing oral lesions and quantified the added value of clinical images (photographs radiographs) on diagnostic accuracy. A total of 160 case vignettes with intraoral images were evaluated using ChatGPT-4 and Gemini Pro 2.5, with Top-1, Top-3, and Top-5 accuracy metrics benchmarked against two oral medicine specialists. Each model was tested with and without images, and analyses included Cochran's Q, McNemar tests with Bonferroni correction, Cohen's <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$h$</tex>, and logistic regression. With images, ChatGPT-4 achieved 63.7 % Top-1 accuracy versus Gemini's 71.2 % and experts' 87.5 %. ChatGPT-4 improved significantly with image input (+13.8 points, <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$p=0.017$</tex>), while Gemini's gain was smaller and non-significant. Both reached <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\sim 95 \%$</tex> Top-3 accuracy, closing the gap with experts. Visual input was most beneficial in high-difficulty and morphologically complex cases, while radiographs offered limited additional value. These findings underscore the promise of multimodal LLMs as assistive tools in oral diagnostics and the need for cautious, evidence-based clinical integration.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.102 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.468 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.