Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessment of Physician Preferences for Large Language Model–Generated Responses Across Geographic Regions and Clinical Experience Levels: Preliminary Survey Study (Preprint)
0
Zitationen
8
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> Large language models (LLMs) have demonstrated increasing capabilities in generating clinically coherent and accurate responses to patient questions, in some cases outperforming physicians in terms of accuracy and empathy. However, little is known about how physicians across geographic regions and levels of clinical experience evaluate these artificial intelligence (AI)–generated responses compared to those authored by human clinicians. </sec> <sec> <title>OBJECTIVE</title> This study examined physician evaluations of LLM-generated versus physician-authored responses to real-world patient questions, comparing preference patterns across geographic regions and years in clinical practice. </sec> <sec> <title>METHODS</title> We conducted a cross-sectional online survey between March and May 2025 among licensed physicians recruited internationally. Participants reviewed anonymized medical responses from 2 LLMs (GPT-4.0 and Meta AI) and verified physicians to questions sourced from Reddit’s r/AskDocs forum. Each participant ranked 3 responses per question (1=most preferred; 3=least preferred) according to accuracy and responsiveness. Mean ranks, pairwise win proportions, and full rank distributions were analyzed descriptively and stratified by geographic region and years in practice. </sec> <sec> <title>RESULTS</title> Overall, LLM-generated responses were strongly preferred. GPT-4.0 achieved the best mean rank (1.63, SD 0.68; 95% CI 1.52-1.74), followed by Meta AI (1.83, SD 0.72; 95% CI 1.71-1.94), while verified physician-authored responses were least preferred (2.53, SD 0.76; 95% CI 2.40-2.65). In pairwise analyses, responses generated by GPT-4.0 won 78% (118/150) of the head-to-head comparisons versus physician-authored responses and 57% (86/150) versus Meta AI responses. Preference for GPT-4.0 was most pronounced in Africa (mean 1.59, SD 0.72), Asia (mean 1.91, SD 0.83), and North America (mean 1.55, SD 0.60), while Meta AI slightly led in Europe (mean 1.33, SD 0.57) and the Americas (mean 1.75). Across experience levels, physicians with less than 5 years in practice (28/52, 54%) ranked GPT-4.0 most favorably (mean 1.58, SD 0.63), followed by those with 10 to 15 years of experience (mean 1.56, SD 0.72). Even among physicians with more than 15 years in practice (9/52, 17%), AI-generated responses outperformed physician-authored responses (mean 1.75 vs 2.62). Across all subgroups, human-authored responses were ranked lowest. </sec> <sec> <title>CONCLUSIONS</title> This exploratory study demonstrates that physicians across diverse regions and experience levels generally prefer LLM-generated responses to human-authored ones. The consistency of this finding across continents and practice durations underscores growing professional acceptance of AI as a viable tool for patient communication. These results suggest that modern LLMs, particularly GPT-4.0, may provide clinically acceptable, contextually relevant, and user-trusted health information, with potential to augment physician workflows and patient education. </sec>