OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 16:12

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Improving Arabic Clinical Question Quality through Domain-Adaptive Masked Language Modeling

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

<title>Abstract</title> Arabic clinical NLP systems often receive short, vague, or incomplete questions, which yields weak downstream answers even with strong encoders. We address this bottleneck by making question quality a first-class, measurable objective. Using domain-adaptive (continued) pretraining with a masked-language objective (DAPT-MLM) on AHQAD (~ 808k Arabic health Q–A pairs), we adapt two widely used backbones—AraBERT and the generator variant of AraELECTRA—to the lexical, syntactic, and discourse patterns of well-formed medical questions. Evaluation is aligned with the learning signal: we report cross-entropy and perplexity only at masked tokens, top-k accuracy restricted to masked spans, and lexical-diversity measures to discourage formulaic phrasing. A length-controlled test design (Short/Long/Very Long) isolates modeling gains from verbosity. Results show consistent intrinsic improvements for the domain-adapted models; AraBERT-MLM is best overall (macro Top-5 = 0.8392, lowest CE/PPL), outperforming AraBERT (orig.) by + 6.0 pp Top-5 and AraELECTRA (orig.) by + 17.2 pp. A 200-item human study (clinician + linguist) corroborates these gains (mean ± 95% CI: Clarity 4.12 ± 0.18, Fluency 3.68 ± 0.22, Semantic Fidelity 3.15 ± 0.25, Usefulness 3.42 ± 0.21; substantial agreement, κ ≈ 0.77) and highlights residual semantic drifts that inform simple, slot-constrained decoding fixes. Overall, the proposed reformulation module produces more natural and clinically relevant Arabic questions and can be plugged into Arabic clinical QA pipelines as a measurable, tunable front-end.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Topic ModelingMachine Learning in HealthcareArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen