Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

mFARM: Towards Multi-Faceted Fairness Assessment based on HARMs in Clinical Decision Support

2025·0 Zitationen·ArXiv.orgOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The deployment of Large Language Models (LLMs) in high-stakes medical settings poses a critical AI alignment challenge, as models can inherit and amplify societal biases, leading to significant disparities. Existing fairness evaluation methods fall short in these contexts as they typically use simplistic metrics that overlook the multi-dimensional nature of medical harms. This also promotes models that are fair only because they are clinically inert, defaulting to safe but potentially inaccurate outputs. To address this gap, our contributions are mainly two-fold: first, we construct two large-scale, controlled benchmarks (ED-Triage and Opioid Analgesic Recommendation) from MIMIC-IV, comprising over 50,000 prompts with twelve race x gender variants and three context tiers. Second, we propose a multi-metric framework - Multi-faceted Fairness Assessment based on hARMs ($mFARM$) to audit fairness for three distinct dimensions of disparity (Allocational, Stability, and Latent) and aggregate them into an $mFARM$ score. We also present an aggregated Fairness-Accuracy Balance (FAB) score to benchmark and observe trade-offs between fairness and prediction accuracy. We empirically evaluate four open-source LLMs (Mistral-7B, BioMistral-7B, Qwen-2.5-7B, Bio-LLaMA3-8B) and their finetuned versions under quantization and context variations. Our findings showcase that the proposed $mFARM$ metrics capture subtle biases more effectively under various settings. We find that most models maintain robust performance in terms of $mFARM$ score across varying levels of quantization but deteriorate significantly when the context is reduced. Our benchmarks and evaluation code are publicly released to enhance research in aligned AI for healthcare.

Autoren

Themen

Artificial Intelligence in Healthcare and EducationAdversarial Robustness in Machine LearningEthics and Social Impacts of AI

Volltext beim Verlag öffnen

mFARM: Towards Multi-Faceted Fairness Assessment based on HARMs in Clinical Decision Support

Abstract

Ähnliche Arbeiten

Autoren

Themen