OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 29.03.2026, 23:17

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

1256 Generative AI-Based Dictation Outperforms Existing Speech-to-Text Software in Neurosurgical Practice

2025·0 Zitationen·Neurosurgery
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2025

Jahr

Abstract

INTRODUCTION: The rapid ascent of artificial intelligence (AI) and advanced machine learning has revolutionized certain aspects of healthcare. Document dictation remains a significant clinical burden, contributing to burnout and detracting from quality patient care. Generative AI systems utilizing transformer-based technology offer efficient speech processing methods, adaptation of which could streamline the work of clinical documentation. METHODS: Ten previously written operative reports from both cranial and spinal neurosurgery procedures were dictated and recorded by three independent physicians. The corresponding audio files were processed by (1) a modified speech-to-text model implemented based on a backbone architecture created by OpenAI’s Whisper model and (2) Nuance Dragon as a comparative commercial standard. Word error rate (WER) was manually reviewed. RESULTS: Mean WER was 1.75% for Whisper and 1.54% for Dragon (p=0.08). Mean total error was 12.1 for Whisper and 11.1 for Dragon (p=0.20). When excluding linguistic errors, Whisper outperformed Dragon with a mean WER of 0.50% vs 1.34% (p<0.001), including total errors (6.1 vs 9.7, p=0.002). For all unstratified dictations, a positive correlation was seen between total errors and word count (p<0.0001, R2=0.37) and between total errors and recording length (p<0.0001, R2=0.22). A positive correlation was seen between words spoken per second and total errors for Dragon (p<0.020, R2=0.18), but not Whisper (p<0.204, R2=0.06). When analyzing linguistic errors, this trend held for Dragon (p<0.014, R2=0.20), but not Whisper (p<0.331, R2=0.03). CONCLUSIONS: AI-based models perform at a non-inferior rate to commercially available speech-to-text dictation programs. Generative models provide potential benefits such as contextual inference that show promise in limiting errors with increased dictation speed or adjustment for impure input data.

Ähnliche Arbeiten