Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Poetic or Prosaic? Evaluating the Linguistic Quality of AI-Generated Draft Replies to Patient Portal Messages
0
Zitationen
8
Autoren
2025
Jahr
Abstract
<title>Abstract</title> Background The use of generative artificial intelligence (genAI) in healthcare is increasing, including the use of GPT-generated draft replies (GDRs) to patient messages via Epic Systems’ electronic health record (EHR). We evaluated GDR use, quality, and impact in a large academic health system. Methods Thirty primary care physicians received GDRs from September 2023 to August 2024 during a staged rollout. Messages were grouped into baseline (GDRs not shown) and intervention (GDRs used). We evaluated messages using BLEU, ROUGE, cosine similarity, BERTScore, token counts and Flesch Reading Ease. We compared baseline and intervention groups, and across prompt refinement phases (Phases 2–4 vs. Phase 1). Blinded evaluations of message quality were conducted via surveys, and BERTScores were correlated with physician evaluations on effectiveness, misunderstanding, and harm. Results Of 66,200 GDRs generated, 21,073 were presented, and 2,264 (11%) were used. Used GDRs showed alignment with final messages [(BLEU 0.49 (95% CI: 0.43–0.56), ROUGE-L 0.60 (0.54–0.66)], with high BERTScores (F1 > 0.9). Final messages were longer and more readable. Prompt refinements increased token retention. GDR usage declined over time, yet providers reported time savings and reduced cognitive load. BERTScores correlated strongly with physician feedback on effectiveness and safety in the intervention group. Conclusions GPT-generated drafts show strong semantic alignment with physician messages and may support efficient communication. However, usage trends and readability challenges underscore the need for improved prompt design and better workflow integration. Quantitative metrics like BERTScore, when paired with physician feedback, offer a scalable framework for evaluating AI-assisted messaging in healthcare.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.100 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.466 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.