OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 18:32

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The algorithm will see you now: how AI evaluates neurosurgeons

2025·0 Zitationen·AI and EthicsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

Abstract As artificial intelligence (AI) increasingly informs healthcare, understanding how large language models (LLMs) evaluate medical professionals is critical. This study quantified biases when LLMs assess neurosurgeon competency using demographic and practice characteristics. We prompted three prominent LLMs (ChatGPT-4o, Claude 3.7 Sonnet, and DeepSeek-V3) to score 6,500 synthetic neurosurgeon profiles. Profiles were created using demographically diverse names derived from public databases and randomly assigned professional attributes (experience, publications, institution, region, specialty) with statistical validation ensuring even distribution across groups. Multivariate regression analysis quantified how each factor influenced competency scores (0–100). Despite identical profiles, LLMs produced inconsistent mean (SD) scores: ChatGPT 91.85 (6.60), DeepSeek 71.74 (10.30), and Claude 62.29 (13.59). All models showed regional biases; North American neurosurgeons received scores 3.09 (ChatGPT) and 2.48 (DeepSeek) points higher than identical African counterparts ( P < .001). ChatGPT penalized East Asian (− 0.83), South Asian (− 0.91), and Middle Eastern (− 0.80) neurosurgeons ( P < .001). Practice setting bias was stronger, with ChatGPT and DeepSeek penalizing independent practitioners by 4.15 and 3.00 points, respectively, compared to hospital-employed peers ( P < .001). Models also displayed inconsistent bias correction, with ChatGPT elevating scores for female (+ 1.61) and Black-American (+ 1.69) neurosurgeons while disadvantaging other groups ( P < .001). This study provides evidence that LLMs incorporate distinct biases when evaluating neurosurgeons. As AI integration accelerates, uncritical adoption risks a self-reinforcing system where algorithmically preferred practitioners receive disproportionate advantages, independent of actual skills. These systems may also undermine global capacity-building by devaluing non-Western practitioners. Understanding and mitigating these biases is fundamental to responsibly navigating the intersection of medicine and AI.

Ähnliche Arbeiten