Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Ein externer Link zum Volltext ist derzeit nicht verfügbar.

Detecting and understanding stigmatizing language in electronic health records using natural language processing

2026·0 Zitationen·Open MIND

Zitationen

Autoren

2026

Jahr

Abstract

Stigmatizing language such as negative descriptors, misgendering, or expressions of disbelief used in Electronic Health Records (EHRs) can perpetuate bias, erode patient trust, and reinforce healthcare disparities. Building upon the growing intersection of health informatics and natural language processing (NLP), this dissertation integrates methodological innovation with fairness-oriented analysis across two complementary case studies to document, detect, and counter stigmatizing language in EHRs. Case 1 introduces a multi-stage transfer learning (MSTL) framework that sequentially adapts transformer-based models through semantic, syntactic, and task-specific fine-tuning. Using datasets spanning hate speech, clinical phenotypes, and stigmatizing language corpora, the framework achieved an accuracy of 89.83% and F1 = 93.18, significantly outperforming traditional baselines and large-language-model comparisons (e.g., GPT-4o). Statistical validation through Wilcoxon-Mann-Whitney tests with Bonferroni correction confirmed the robustness of performance gains (p < .05). The MSTL-Longformer model demonstrated consistent accuracy across demographic subgroups, highlighting its capacity to detect subtle and context-dependent forms of stigma in long clinical narratives. Case 2 extends this framework to fairness auditing, baseline modeling, and interpretive contextualization for gender-expansive patient (GEP) documentation. Multivariable logistic-regression and odds-ratio analyses revealed that GEP and Black/African American patients had the highest adjusted odds of stigmatizing documentation, underscoring intersectional disparities in clinical narratives. Baseline models for SL detection among GEPs established comparative benchmarks for future studies, while comprehensive annotation guidelines were developed to standardize identification of ambiguous expressions, family-attributed remarks, and misgendering. Together, these studies demonstrate that detecting linguistic bias in EHRs is both a computational and ethical challenge. The proposed frameworks show that domain-progressive transfer learning can substantially improve model accuracy while fairness-aware evaluation exposes structural inequities in documentation. By combining advanced NLP modeling with ethical inquiry, this dissertation contributes new methodological and conceptual tools for responsible AI in healthcare. The findings illustrate how linguistic equity can be operationalized through data-driven innovations, ensuring that the language of medicine and the algorithms can promote inclusion, transparency, and justice in patient care.

Autoren

Liyang Xue

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationMental Health via Writing

Detecting and understanding stigmatizing language in electronic health records using natural language processing

Abstract

Ähnliche Arbeiten

Autoren

Themen