Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of <scp>GPT</scp> ‐5, a Large Language Model, in Replicating German Clinical Practice Guideline Recommendations in Oral Oncology: A Cross‐Sectional Concordance Study
0
Zitationen
5
Autoren
2026
Jahr
Abstract
BACKGROUND: Artificial intelligence (AI) technologies, particularly large language models (LLMs) such as ChatGPT, are increasingly utilised in medical education and clinical information retrieval. Nevertheless, their capacity to accurately reproduce recommendations from established clinical practice guidelines (CPGs) has not been thoroughly examined. The present study evaluated the concordance between responses generated by GPT-5 and recommendations contained in German CPGs addressing oral potentially malignant disorders (OPMDs) and oral carcinomas (OCs). METHODS: A cross-sectional analytical comparison was performed between GPT-5 outputs and German CPG recommendations available as of October 2025. Individual guideline statements were entered verbatim into GPT-5, which was asked to confirm or reject the statements. To assess methodological robustness, inverted versions of the same statements were additionally tested. GPT-5 was accessed through the free version without internet connectivity to ensure that responses originated solely from the model's internal training data. Accuracy was defined as the proportion of correctly classified statements. Concordance between guideline content and model responses was quantified using Cohen's 𝝹. RESULTS: Two German CPGs comprising 111 recommendations were included: the S2k guideline for OPMDs (15 recommendations) and the S3 guideline for OCs (96 recommendations). GPT-5 correctly affirmed all authentic recommendations and rejected all inverted statements. Agreement between guideline statements and GPT-5 responses was perfect when the original recommendations were analysed (𝝹 = 1.0) and remained very high when both original and inverted statements were evaluated jointly (𝝹 = 0.96). The majority of references cited within the guidelines were published in English (> 93%) and originated from outside Germany (> 77%). CONCLUSION: When guideline recommendations were presented verbatim, GPT-5 demonstrated complete concordance with German oral oncology CPGs. These findings indicate that the model is capable of recognising and retrieving established guideline information. However, this experimental design evaluates recognition of existing statements rather than autonomous clinical reasoning. At present, LLMs should therefore be regarded primarily as educational and informational tools rather than a replacement for expert clinical judgement in oral oncology.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.644 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.550 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.061 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.850 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
Institutionen
- Danube Private University(AT)
- Universität für Weiterbildung Krems(AT)
- University Hospital Ulm(DE)
- Chulalongkorn University(TH)
- Medizinische Hochschule Brandenburg Theodor Fontane(DE)
- Universitätsklinikum Brandenburg an der Havel(DE)
- Goethe University Frankfurt(DE)
- Bayreuth Medical Center(DE)
- Sana Klinikum Offenbach(DE)