OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.05.2026, 02:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large language model bias auditing for periodontal diagnosis using an ambiguity-probe methodology: a pilot study

2026·0 Zitationen·Frontiers in Digital HealthOpen Access
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2026

Jahr

Abstract

Background: Large Language Models (LLMs) in healthcare holds immense promise yet carries the risk of perpetuating social biases. While artificial intelligence (AI) fairness is a growing concern, a gap exists in understanding how these models perform under conditions of clinical ambiguity, a common feature in real-world practice. Methods: We conducted a study using an ambiguity-probe methodology with a set of 42 sociodemographic personas and 15 clinical vignettes based on the 2018 classification of periodontal diseases. Ten were clear-cut scenarios with established ground truths, while five were intentionally ambiguous. OpenAI's GPT-4o and Google's Gemini 2.5 Pro were prompted to provide periodontal stage and grade assessments using 630 vignette-persona combinations per model. Results: In clear-cut scenarios, GPT-4o demonstrated significantly higher combined (stage and grade) accuracy (70.5%) than Gemini Pro (33.3%). However, a robust fairness analysis using cumulative link models with false discovery rate correction revealed no statistically significant sociodemographic bias in either model. This finding held true across both clear-cut and ambiguous clinical scenarios. Conclusion: To our knowledge, this is among the first study to use simulated clinical ambiguity to reveal the distinct ethical fingerprints of LLMs in a dental context. While LLM performance gaps exist, our analysis decouples accuracy from fairness, demonstrating that both models maintain sociodemographic neutrality. We identify that the observed errors are not bias, but rather diagnostic boundary instability. This highlights a critical need for future research to differentiate between these two distinct types of model failure to build genuinely reliable AI.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationOral microbiology and periodontitis researchDental Radiography and Imaging
Volltext beim Verlag öffnen