Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Ein externer Link zum Volltext ist derzeit nicht verfügbar.
Detecting and understanding stigmatizing language in electronic health records using natural language processing
0
Zitationen
1
Autoren
2026
Jahr
Abstract
Stigmatizing language such as negative descriptors, misgendering, or expressions of disbelief used in Electronic Health Records (EHRs) can perpetuate bias, erode patient trust, and reinforce healthcare disparities. Building upon the growing intersection of health informatics and natural language processing (NLP), this dissertation integrates methodological innovation with fairness-oriented analysis across two complementary case studies to document, detect, and counter stigmatizing language in EHRs. Case 1 introduces a multi-stage transfer learning (MSTL) framework that sequentially adapts transformer-based models through semantic, syntactic, and task-specific fine-tuning. Using datasets spanning hate speech, clinical phenotypes, and stigmatizing language corpora, the framework achieved an accuracy of 89.83% and F1 = 93.18, significantly outperforming traditional baselines and large-language-model comparisons (e.g., GPT-4o). Statistical validation through Wilcoxon-Mann-Whitney tests with Bonferroni correction confirmed the robustness of performance gains (p < .05). The MSTL-Longformer model demonstrated consistent accuracy across demographic subgroups, highlighting its capacity to detect subtle and context-dependent forms of stigma in long clinical narratives. Case 2 extends this framework to fairness auditing, baseline modeling, and interpretive contextualization for gender-expansive patient (GEP) documentation. Multivariable logistic-regression and odds-ratio analyses revealed that GEP and Black/African American patients had the highest adjusted odds of stigmatizing documentation, underscoring intersectional disparities in clinical narratives. Baseline models for SL detection among GEPs established comparative benchmarks for future studies, while comprehensive annotation guidelines were developed to standardize identification of ambiguous expressions, family-attributed remarks, and misgendering. Together, these studies demonstrate that detecting linguistic bias in EHRs is both a computational and ethical challenge. The proposed frameworks show that domain-progressive transfer learning can substantially improve model accuracy while fairness-aware evaluation exposes structural inequities in documentation. By combining advanced NLP modeling with ethical inquiry, this dissertation contributes new methodological and conceptual tools for responsible AI in healthcare. The findings illustrate how linguistic equity can be operationalized through data-driven innovations, ensuring that the language of medicine and the algorithms can promote inclusion, transparency, and justice in patient care.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.588 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.861 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.423 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.917 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.494 Zit.