Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Dutch-Language Ambient Listening in Simulated Clinical Encounters: Comparing Three Providers in a Multi-Speaker, Multi-Dialect Study (Preprint)
0
Zitationen
7
Autoren
2026
Jahr
Abstract
<sec> <title>BACKGROUND</title> Clinicians spent a lot of time on Electronic Health Record (EHR) documentation, often at the expense of patient interaction. Ambient listening technology uses artificial intelligence to passively record and summarize clinical encounters. While initial studies are promising, there is limited evidence on system performance in complex, non-English settings. </sec> <sec> <title>OBJECTIVE</title> To compare the documentation performance of three commercially available ambient listening systems in simulated Dutch-language outpatient consultations by assessing note completeness, correctness, and conciseness under predefined linguistic and interactional challenges. </sec> <sec> <title>METHODS</title> Standardized audio recordings of ten scripted physician–patient interactions in two specialties were used. Scenarios included multi-speaker dynamics (patient companion), conversational disruptions (nurse interruption), evasive patient communication, and a regional dialect (Gronings). Three distinct AI documentation systems (Provider A, Provider B, and Provider C) processed the audio files. Eight human raters evaluated the resulting AI-generated notes against reference summaries for Completeness, Conciseness, and Correctness using a 5-point ordinal scale. Inter-rater agreement was assessed using Gwet’s AC2. System-level technical characteristics were assessed alongside clinical performance to aid interpretation of between-vendor differences. </sec> <sec> <title>RESULTS</title> Across 351 ratings on a 1-5 scale, the overall inter-rater agreement was high (Gwet’s AC2 = 0.827). Mean scores were tightly clustered across providers (Provider C: 4.26, Provider B: 4.00, Provider A: 3.82). Mean scores were higher in Otolaryngology (mean 4.36) than Surgical Oncology (mean 3.68). Across scoring domains, correctness received the highest mean score (4.21), while completeness received lowest (3.81). Variation in mean scores was observed across script scenarios. Dialect-specific scenarios showed the lowest mean score (3.77) and the greatest variability across providers. Median summary generation times ranged from 13.5 seconds (Provider C) to 22.0 seconds (Provider B). </sec> <sec> <title>CONCLUSIONS</title> Ambient listening systems demonstrate good performance in Dutch clinical settings, even under conditions simulating common conversational challenges. While accuracy is generally high, performance is sensitive to linguistic variation. Future deployment studies must prioritize linguistic equity, real-world validation of efficiency gains, and evaluation of both clinician and patient perception to understand how these systems influence consultation dynamics and care delivery across diverse patient populations. </sec>
Ähnliche Arbeiten
Making sense of Cronbach's alpha
2011 · 13.693 Zit.
Technology-Enhanced Simulation for Health Professions Education
2011 · 1.931 Zit.
The future vision of simulation in health care
2004 · 1.849 Zit.
Does Simulation-Based Medical Education With Deliberate Practice Yield Better Results Than Traditional Clinical Education? A Meta-Analytic Comparative Review of the Evidence
2011 · 1.704 Zit.
A critical review of simulation‐based medical education research: 2003–2009
2009 · 1.648 Zit.