Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Human versus artificial intelligence in oral pathology diagnosis: a comparative study of ChatGPT, Grok, and MANUS
0
Zitationen
3
Autoren
2026
Jahr
Abstract
Artificial intelligence (AI) integration in diagnostic medicine has advanced accuracy and efficiency, particularly in pathology. This study assessed the diagnostic performance of three large language models (LLMs)-ChatGPT (GPT-4-turbo), Grok (xAI), and MANUS-in interpreting histopathology slides of oral lesions. A comparative diagnostic study was conducted using 100 high-resolution slides representing diverse oral pathologies. Images were sourced from a validated textbook and reviewed by two board-certified oral pathologists who provided consensus diagnoses. Each slide was analysed twice by the three AI models using standardized prompts. Diagnostic accuracy, intra-model consistency, inter-model concordance, and agreement with human experts were evaluated using descriptive statistics, Cohen's kappa, McNemar's test, and chi-square analysis. All AI models demonstrated high diagnostic accuracy. In the second round, Grok achieved the highest accuracy (97%), followed by MANUS (96%) and ChatGPT (94%). ChatGPT showed the highest intra-model consistency (κ = 0.918), while MANUS and Grok displayed substantial agreement (κ = 0.790 and 0.740). Expert pathologists achieved 98% accuracy. Comparisons between AI models and human diagnoses showed moderate to substantial agreement, with MANUS most aligned with experts. Most misclassifications occurred in histologically ambiguous cases, with no significant differences between AI models. Multimodal LLMs demonstrated strong diagnostic capabilities, consistency, and alignment with expert reasoning in oral histopathology interpretation. Grok was the most accurate, ChatGPT the most consistent, and MANUS the most expert-aligned. These findings support AI integration into digital pathology for diagnostic support, education, and quality assurance, with further validation in clinical datasets recommended.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.349 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.219 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.631 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.480 Zit.