Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The algorithm will see you now: how AI evaluates neurosurgeons
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Abstract As artificial intelligence (AI) increasingly informs healthcare, understanding how large language models (LLMs) evaluate medical professionals is critical. This study quantified biases when LLMs assess neurosurgeon competency using demographic and practice characteristics. We prompted three prominent LLMs (ChatGPT-4o, Claude 3.7 Sonnet, and DeepSeek-V3) to score 6,500 synthetic neurosurgeon profiles. Profiles were created using demographically diverse names derived from public databases and randomly assigned professional attributes (experience, publications, institution, region, specialty) with statistical validation ensuring even distribution across groups. Multivariate regression analysis quantified how each factor influenced competency scores (0–100). Despite identical profiles, LLMs produced inconsistent mean (SD) scores: ChatGPT 91.85 (6.60), DeepSeek 71.74 (10.30), and Claude 62.29 (13.59). All models showed regional biases; North American neurosurgeons received scores 3.09 (ChatGPT) and 2.48 (DeepSeek) points higher than identical African counterparts ( P < .001). ChatGPT penalized East Asian (− 0.83), South Asian (− 0.91), and Middle Eastern (− 0.80) neurosurgeons ( P < .001). Practice setting bias was stronger, with ChatGPT and DeepSeek penalizing independent practitioners by 4.15 and 3.00 points, respectively, compared to hospital-employed peers ( P < .001). Models also displayed inconsistent bias correction, with ChatGPT elevating scores for female (+ 1.61) and Black-American (+ 1.69) neurosurgeons while disadvantaging other groups ( P < .001). This study provides evidence that LLMs incorporate distinct biases when evaluating neurosurgeons. As AI integration accelerates, uncritical adoption risks a self-reinforcing system where algorithmically preferred practitioners receive disproportionate advantages, independent of actual skills. These systems may also undermine global capacity-building by devaluing non-Western practitioners. Understanding and mitigating these biases is fundamental to responsibly navigating the intersection of medicine and AI.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.