Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes

2025·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

AI-generated clinical notes are increasingly used in healthcare, but evaluating their quality remains a challenge due to high subjectivity and limited scalability of expert review.Existing automated metrics often fail to align with real-world physician preferences.To address this, we propose a pipeline that systematically distills real user feedback into structured checklists for note evaluation.These checklists are designed to be interpretable, grounded in human feedback, and enforceable by LLM-based evaluators.Using deidentified data from over 21,000 clinical encounters (prepared in accordance with the HIPAA safe harbor standard) from a deployed AI medical scribe system, we show that our feedback-derived checklist outperforms a baseline approach in our offline evaluations in coverage, diversity, and predictive power for human ratings.Extensive experiments confirm the checklist's robustness to quality-degrading perturbations, significant alignment with clinician preferences, and practical value as an evaluation methodology.In offline research settings, our checklist offers a practical tool for flagging notes that may fall short of our defined quality standards.

Autoren

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsMachine Learning in Healthcare

Volltext beim Verlag öffnen

From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes

Abstract

Ähnliche Arbeiten

Autoren

Themen