Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior
16
Zitationen
9
Autoren
2025
Jahr
Abstract
Large language models (LLMs) exhibit a vulnerability arising from being trained to be helpful: a tendency to comply with illogical requests that would generate false information, even when they have the knowledge to identify the request as illogical. This study investigated this vulnerability in the medical domain, evaluating five frontier LLMs using prompts that misrepresent equivalent drug relationships. We tested baseline sycophancy, the impact of prompts allowing rejection and emphasizing factual recall, and the effects of fine-tuning on a dataset of illogical requests, including out-of-distribution generalization. Results showed high initial compliance (up to 100%) across all models, prioritizing helpfulness over logical consistency. Prompt engineering and fine-tuning improved performance, improving rejection rates on illogical requests while maintaining general benchmark performance. This demonstrates that prioritizing logical consistency through targeted training and prompting is crucial for mitigating the risk of generating false medical information and ensuring the safe deployment of LLMs in healthcare.
Ähnliche Arbeiten
Autoren
Institutionen
- Brigham and Women's Hospital(US)
- Boston Children's Hospital(US)
- Harvard University(US)
- Dana-Farber Cancer Institute(US)
- Dana-Farber Brigham Cancer Center(US)
- Mass General Brigham(US)
- Massachusetts Institute of Technology(US)
- Johns Hopkins University(US)
- University of Virginia(US)
- Maastricht University(NL)
- Artificial Intelligence in Medicine (Canada)(CA)