OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.03.2026, 14:53

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Same logs, different voices: AI-generated vs human feedback during clinical clerkship in undergraduate education (Preprint)

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

<sec> <title>BACKGROUND</title> Feedback is essential for medical students' learning during clinical clerkships, yet supervising physicians often struggle to provide meaningful written feedback due to time constraints. Large language models (LLMs) offer a promising approach to supplement human feedback, but how AI-generated and human feedback differ in authentic clinical settings remains unclear. Previous studies have yielded inconsistent findings regarding feedback length, quality, and distinguishability, with most comparisons conducted in classroom or simulation contexts rather than clinical environments. </sec> <sec> <title>OBJECTIVE</title> To examine how AI-generated feedback and supervisor-provided feedback differ when applied to medical students' clinical clerkship logs, by identifying the distinct characteristics and complementary strengths of each feedback type. </sec> <sec> <title>METHODS</title> This mixed-methods study employed a convergent design. We collected 161 sets of weekly clinical clerkship logs from fifth- and sixth-year medical students at Nagoya University, Japan, along with corresponding supervisor feedback and AI-generated feedback using GPT-4o. Ten faculty members and ten medical students evaluated both feedback types using a validated rubric assessing five categories: criteria-based, clear directions for improvement, accuracy, prioritization, and supportive tone. Quantitative analyses included paired t-tests, cumulative link mixed models, and correlation analyses. Qualitative thematic analysis examined evaluators' open-ended comments. Results were integrated using Joint Display Analysis. </sec> <sec> <title>RESULTS</title> AI feedback was significantly longer than supervisor feedback (mean: 382 vs. 98 characters, p &lt;0.001). AI feedback scored significantly higher on criteria-based (OR=11.81, p&lt;0.001) and clear direction (OR=6.61, p&lt;0.001) categories, with no significant differences in accuracy, prioritization, or supportive tone. AI feedback demonstrated greater quality consistency, while supervisor feedback showed higher variability (variance ratio 3.9:1). For supervisor feedback, length positively correlated with quality scores; no such correlation existed for AI feedback. All evaluators correctly identified feedback sources. Qualitative analysis of open-ended comments revealed five themes: adherence to feedback criteria and structure, continuity and consistency, perspective as a clinician, quality of Japanese language, and text length. AI provided structured, text-anchored feedback following rubric criteria, while supervisors offered experience-based feedback grounded in clinical context and professional expertise that sometimes lacked structured elements. </sec> <sec> <title>CONCLUSIONS</title> AI-generated and supervisor-provided feedback show distinct but complementary strengths. AI consistently delivers structured, criterion-based feedback aligned with written content, addressing gaps that may arise when time-pressured supervisors provide brief feedback. However, AI lacks the clinical perspective and contextual grounding that supervisors bring from direct observation and professional experience. These findings suggest that AI feedback should complement rather than replace human feedback in clinical clerkship settings, with each type addressing the other's limitations to optimize student learning. </sec>

Ähnliche Arbeiten

Autoren

Themen

Simulation-Based Education in HealthcareInnovations in Medical EducationArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen