OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.03.2026, 17:26

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

<i>Hippocrates-o1</i> : A Guideline-Aware, Orchestrated, Self-Refining Protocol for Specialty-Specific Clinical Reasoning

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

17

Autoren

2025

Jahr

Abstract

Abstract Background Clinical decision support requires language models that provide guideline-aligned, context-aware reasoning with clear justification. Many existing benchmarks emphasize multiple-choice or short-form question answering and mainly capture factual recall rather than longitudinal clinical reasoning from extended clinical notes. Hippocrates-o1 is a family of domain-tailored clinical reasoning pipelines that combine structured prompts, guideline-informed retrieval, and iterative self-refinement across oncology, general surgery, and vascular surgery. Methods Real-world head and neck cancer cases were drawn from the MIMIC-IV-Note database, with a subset (n=20) randomly selected for detailed annotation. Six physicians adjudicated treatment phase and intent using structured criteria and rated model outputs. For each case, we generated outputs using both a general-purpose baseline model ( VanillaLLM ) and our oncology-specific reasoning model, Hippocrates-Karkinos-o1 . Experts evaluated the outputs across five dimensions on a scale of 1 to 5: Clinical Knowledge Application, Contextual Understanding, Reasoning Transparency, Chain-of-Thought Quality, and Hallucination Audit. Overall Reasoning was the mean of domain scores. To explore whether the approach could extend beyond oncology, we also processed inguinal hernia and aortic aneurysm cases through Hippocrates-Chirurgos-o1 and Hippocrates-Angios-o1 domain adaptations. Results Across paired ratings, Hippocrates-Karkinos-o1 improved Overall Reasoning from 3.40±0.90 to 4.00±0.73 (p&lt;0.001). Domain scores increased for Clinical Knowledge Application (2.87±1.20 to 3.70±1.03), Contextual Understanding (3.48±0.95 to 3.98±0.95), Hallucination Audit (3.90±1.32 to 4.74±0.76), Reasoning Transparency (3.45±1.02 to 3.86±0.87), and Chain-of-Thought Quality (3.32±1.04 to 3.69±1.00), all p≤0.001. Surgical and vascular adaptations showed parallel qualitative improvements. Conclusions The Hippocrates-o1 protocol improved reasoning fidelity, guideline alignment, and factual grounding relative to a general-purpose model and generalized across oncology, surgery, and vascular care. Orchestrated retrieval and self-refinement provide a reproducible template for evaluating and enhancing clinical reasoning in medical AI.

Ähnliche Arbeiten