Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Clinical Relevance of Large Language Models in Endodontics: Diagnostic Appropriateness Based on 50 Simulated Case Scenarios
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Large language models (LLMs) are increasingly used in healthcare, but their performance in endodontic decision-making remains unclear. This study aimed to compare six LLMs in terms of diagnostic appropriateness for endodontic treatment planning. Fifty clinical scenarios were developed and entered into six LLMs (ChatGPT-4o, ChatGPT-3.5, Claude 4, Copilot, DeepSeek-V3, Gemini 2.5). Two specialists scored responses as appropriate or inappropriate. Repeated measures ANOVA and chi-square tests were used for analysis. Claude showed the highest accuracy (76%), followed by DeepSeek and Gemini. ChatGPT-3.5 had the lowest (40%). Significant differences were found between models (p < 0.05). Performance was better on straightforward cases than on complex scenarios. LLMs vary widely in diagnostic accuracy for endodontic cases. While some models show promise, others may provide confidently incorrect recommendations. Caution and human oversight remain essential until domain-specific, fine-tuned models are developed.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.250 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.109 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.482 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.434 Zit.