Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparing Speech Synthesis Models for Polish Medical Speech Naturalness

2025·0 Zitationen·Proceedings of the International Conference on Information Systems DevelopmentOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This research investigates the perceived naturalness of synthesized speech in the context of Polish medical terminology, a critical factor for applications such as voice-enabled medical dialogue systems. We conducted a comparative analysis of three speech synthesis models: SpeechGen, ElevenLabs, and a version of ToucanTTS fine-tuned on a specialized corpus of Polish medical recordings. The evaluation employed objective measures, the NISQA metric, and subjective assessments through Mean Opinion Score (MOS) surveys. Our findings indicate that SpeechGen and ElevenLabs produce synthesized speech that closely rivals the naturalness of human speech, as evidenced by both NISQA scores and MOS ratings. In contrast, despite improvements, the fine-tuned ToucanTTS model did not achieve comparable levels of perceived naturalness. Notably, participants occasionally rated the advanced synthesized speech as more natural than human speech recorded in non-studio environments, underscoring the potential of these technologies in real-world applications. This study emphasizes the significance of naturalness in enhancing user experience, particularly in specialized linguistic domains. It provides insights into speech synthesis's current capabilities and limitations for less-resourced languages like Polish.

Autoren

Institutionen

Gdańsk University of Technology(PL)

Themen

Speech and dialogue systemsArtificial Intelligence in Healthcare and EducationSpeech Recognition and Synthesis

Volltext beim Verlag öffnen

Comparing Speech Synthesis Models for Polish Medical Speech Naturalness

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen