OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 05:41

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Developing and Testing an Engineering Framework for Curiosity-Driven and Humble AI in Clinical Decision Support

2026·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

19

Autoren

2026

Jahr

Abstract

Abstract Background We present BODHI (Balanced, Open-minded, Diagnostic, Humble, and Inquisitive), an engineering framework for curiosity-driven and humble clinical decision support AI. Despite growing capabilities, large language models (LLMs) often express inappropriate confidence, conflating statistical pattern recognition with genuine medical understanding. BODHI addresses this through a dual-reflective architecture that: (1) decomposes epistemic uncertainty into task-specific dimensions, and (2) constrains model responses using virtue-based stance rules derived from a Virtue Activation Matrix. Methods We validate the framework through controlled evaluation on 200 clinical vignettes from HealthBench Hard, assessing GPT-4o-mini and GPT-4.1-mini across 5 random seeds (1,800 total observations). Statistical analysis included bootstrap resampling, paired t-tests, and effect size computation (Supplementary Materials S3) Findings BODHI significantly improved overall clinical response quality (GPT-4.1-mini: +17.3pp, p < 0.0001, Cohen’s d = 0.50; GPT-4o-mini: +7.4pp, p < 0.0001, Cohen’s d = 0.22) while achieving very large effect sizes on curiosity (context-seeking rate: Cohen’s d = 16.38 and 19.54) and humility (hedging: d = 5.80 for GPT-4.1-mini) metrics. Crucially, 97.3% of GPT-4.1-mini responses and 73.5% of GPT-4o-mini responses included appropriate clarifying questions, compared to 7.8% and 0.0% at baseline, demonstrating the framework’s effectiveness in eliciting information-gathering behavior. Interpretation These findings suggest LLMs can be reliably constrained to operate within epistemic boundaries when provided with structured uncertainty decomposition and virtue-aligned response rules, offering a pathway toward safer clinical AI deployment.

Ähnliche Arbeiten