Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Coding Inequity: Assessing GPT-4’s Potential for Perpetuating Racial and Gender Biases in Healthcare
30
Zitationen
12
Autoren
2023
Jahr
Abstract
Abstract Background Large language models (LLMs) such as GPT-4 hold great promise as transformative tools in healthcare, ranging from automating administrative tasks to augmenting clinical decision- making. However, these models also pose a serious danger of perpetuating biases and delivering incorrect medical diagnoses, which can have a direct, harmful impact on medical care. Methods Using the Azure OpenAI API, we tested whether GPT-4 encodes racial and gender biases and examined the impact of such biases on four potential applications of LLMs in the clinical domain—namely, medical education, diagnostic reasoning, plan generation, and patient assessment. We conducted experiments with prompts designed to resemble typical use of GPT-4 within clinical and medical education applications. We used clinical vignettes from NEJM Healer and from published research on implicit bias in healthcare. GPT-4 estimates of the demographic distribution of medical conditions were compared to true U.S. prevalence estimates. Differential diagnosis and treatment planning were evaluated across demographic groups using standard statistical tests for significance between groups. Findings We find that GPT-4 does not appropriately model the demographic diversity of medical conditions, consistently producing clinical vignettes that stereotype demographic presentations. The differential diagnoses created by GPT-4 for standardized clinical vignettes were more likely to include diagnoses that stereotype certain races, ethnicities, and gender identities. Assessment and plans created by the model showed significant association between demographic attributes and recommendations for more expensive procedures as well as differences in patient perception. Interpretation Our findings highlight the urgent need for comprehensive and transparent bias assessments of LLM tools like GPT-4 for every intended use case before they are integrated into clinical care. We discuss the potential sources of these biases and potential mitigation strategies prior to clinical implementation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.
Autoren
Institutionen
- University of California, San Francisco(US)
- UCSF Helen Diller Family Comprehensive Cancer Center(US)
- Massachusetts Institute of Technology(US)
- Stanford Medicine(US)
- Stanford University(US)
- Brigham and Women's Hospital(US)
- Beth Israel Deaconess Medical Center(US)
- Harvard University(US)
- Hadassah Medical Center(IL)
- Cancer Research And Biostatistics(US)
- Emory University(US)
- Center for Applied Linguistics(US)
- Executive Office of the President(US)
- California Digital Library(US)
- University of California Office of the President(US)