Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Alignment of large language models in solving medical ethical dilemmas
0
Zitationen
8
Autoren
2026
Jahr
Abstract
Background Large language models (LLMs) are entering clinical workflows, yet their alignment with medical-ethical principles is unclear. Objective To evaluate how LLMs reconcile deontological and utilitarian reasoning in medically adapted “Trolley dilemma” scenarios. Methods We tested 14 LLMs, including GPT-o1-preview and DeepSeek-R1-Distill-Llama, using five medically adapted Trolley dilemmas sourced from philosophical and real-world scenarios. Each model generated 10 independent answers per case (overall 700 responses). Responses were forced to “yes” (utilitarian) or “no” (deontological). The primary outcome was the proportion of utilitarian choices. χ 2 tests compared models. Results LLM responses varied. Some models, such as GPT-4 and Gemma, selected the utilitarian option in 20% of cases, while others, like Qwen-2-7B, reached 80% ( p < .001). GPT-o1-preview and DeepSeek-R1-Distill-Llama selected utilitarian responses in 44% and 38% of cases, respectively. Some models selected the consequence-maximizing option in vignettes where doing so required overriding consent or endorsing intentional harm within the scenario framing. Seven of the fourteen models tested recommended impermissible medical actions, including nonconsensual limb amputation, killing a healthy person for organ harvesting, and transferring a patient to hospice against their wishes, in 44 of 700 outputs (6.3 %). Conclusions In this constrained forced-choice stress test, models occasionally endorsed boundary-violating actions within the vignette framing. These results do not generalize to routine clinical decision-making, but motivate further benchmarked evaluation of refusal behavior and boundary constraints in ethically sensitive medical contexts. We must carefully define how these models influence clinical decisions, establish robust ethical guidelines for their use, and ensure that fundamental ethical norms are not compromised.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.693 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.598 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.124 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.871 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.