OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.03.2026, 17:54

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparing Speech Synthesis Models for Polish Medical Speech Naturalness

2025·0 Zitationen·Proceedings of the International Conference on Information Systems DevelopmentOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

This research investigates the perceived naturalness of synthesized speech in the context of Polish medical terminology, a critical factor for applications such as voice-enabled medical dialogue systems. We conducted a comparative analysis of three speech synthesis models: SpeechGen, ElevenLabs, and a version of ToucanTTS fine-tuned on a specialized corpus of Polish medical recordings. The evaluation employed objective measures, the NISQA metric, and subjective assessments through Mean Opinion Score (MOS) surveys. Our findings indicate that SpeechGen and ElevenLabs produce synthesized speech that closely rivals the naturalness of human speech, as evidenced by both NISQA scores and MOS ratings. In contrast, despite improvements, the fine-tuned ToucanTTS model did not achieve comparable levels of perceived naturalness. Notably, participants occasionally rated the advanced synthesized speech as more natural than human speech recorded in non-studio environments, underscoring the potential of these technologies in real-world applications. This study emphasizes the significance of naturalness in enhancing user experience, particularly in specialized linguistic domains. It provides insights into speech synthesis's current capabilities and limitations for less-resourced languages like Polish.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Speech and dialogue systemsArtificial Intelligence in Healthcare and EducationSpeech Recognition and Synthesis
Volltext beim Verlag öffnen