Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
SIMSAMU - A French medical dispatch dialog open dataset
0
Zitationen
5
Autoren
2025
Jahr
Abstract
BACKGROUND: Dispatch Services (DS) are essential to Emergency Medical Services (EMS). Dispatchers enable patients to access medical assistance in emergencies, anytime and anywhere, within limited time and resources. AI-based decision-support tools hold great promise for dispatchers. Developing these tools requires medical field-specific data. Medical dispatch dialogue is unique: it is a brief phone exchange in an emergency, within a limited time frame, without a physical examination. OBJECTIVE: Our main objective was to (i) create an open French dataset of medical dispatch dialogues. Our secondary objectives were to (ii) develop a detailed medical dispatch scheme from this dataset using an unsupervised method, and (iii) provide a baseline evaluation of diarization and speech recognition models for this domain in French. METHODS: From 2022 to 2023, emergency medicine junior doctors simulated real-life medical dispatch calls. These calls were recorded and transcribed to form the SIMSAMU corpus. We developed a dispatch scheme based on (i) recording analysis, (ii) data-driven utterance typology, and (iii) domain expertise. Utterance typology was derived via hierarchical clustering of representations learned by finetuning BERT embeddings on SIMSAMU. Clusters were mapped to the Roter Interaction Analysis System (RIAS) and included in our dispatch scheme. SIMSAMU was used to train and evaluate state-of-the-art neural network models for diarization and speech recognition. Diarization used the PyaNet model, fine-tuned on the ESLO2 dataset. Speech recognition used a CTC model with pre-trained wav2vec 2.0 embedding, compared to the multilingual Whisper model. The CTC-wav2vec model was further fine-tuned on SIMSAMU and evaluated by leave-one-speaker-out cross-validation. RESULTS: The dataset consists of 61 audio recordings totaling 3 h 14 min. Four clusters were identified for callers and 3 for dispatchers. Two main dialogue phases were identified: interrogation and contractualization. The diarization model achieved a 10.4 % error rate. Speech recognition word error rates were 35.8 % for Whisper, 24.8 % for the CTC-wav2vec model fine-tuned on ESLO2, and 16.1 % after in-domain fine-tuning. CONCLUSION: We propose a French open medical dispatch dialogue dataset and an expert-validated schema of the medical dispatch dialogue based on unsupervised analysis. Notable gaps in how well speech recognition models generalize underscore the need for targeted, in-domain fine-tuning in this specialized application. SIMSAMU is designed to support this effort by serving as a benchmark for evaluating domain-adapted speech recognition and dialogue modeling strategies.
Ähnliche Arbeiten
Autoren
Institutionen
- Inserm(FR)
- Université Paris Cité(FR)
- Sorbonne Université(FR)
- Centre de Recherche des Cordeliers(FR)
- Assistance Publique – Hôpitaux de Paris(FR)
- Hôpital Européen Georges-Pompidou(FR)
- Sorbonne Paris Cité(FR)
- Centre National de la Recherche Scientifique(FR)
- Université Sorbonne Paris Nord(FR)
- Laboratoire de physique des lasers(FR)
- Laboratoire d'Informatique de Paris-Nord(FR)