OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 23:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of large language models in preoperative and postoperative counselling for aesthetic facial procedures

2026·0 Zitationen·British Journal of Oral and Maxillofacial SurgeryOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

Large language models (LLMs) are increasingly used in healthcare, but their role in aesthetic surgical procedures remains unexplored. These interventions present unique challenges, marked by high patient expectations, emotionally charged decision-making, and subtle yet impactful outcomes on self-perception and psychosocial health. This cross-sectional in silico study evaluated the performance of ChatGPT-4 (OpenAI, 2025), DeepSeek V3 (DeepSeek AI/High-Flyer, 2025), and Gemini 2.5 Pro Experimental (Google, 2025) in preoperative and postoperative counselling for aesthetic facial surgery. Twenty-six standardised patient-oriented questions were submitted, and the anonymised responses of the chatbots were independently assessed by two calibrated oral and maxillofacial surgeons across four domains: accuracy, empathy, readability (Flesch-Kincaid Reading Ease (FKRE) and Grade Level (FKGL)), and referencing reliability (including the identification of fabricated or non-verifiable citations, a phenomenon referred to as "hallucination" in LLM outputs). Statistical tests included Kruskal-Wallis, Mann-Whitney U with Bonferroni correction, Spearman correlation, and chi-squared. DeepSeek achieved the highest accuracy (4.77 (0.51), p = 0.0078) and readability (FKRE 2.92 (0.27), p < 0.00001), while Gemini outperformed in empathy (4.08 (0.89), p < 0.001). GPT-4 produced the most hallucinated citations (36%) compared with Gemini (14%) and DeepSeek (8.8%) (p < 0.00001). A negative correlation between empathy and readability (r = -0.34, p = 0.002) suggested a trade-off between affective tone and accessibility. Overall, LLMs generated satisfactory counselling responses with distinct performance profiles, supporting their potential in patient-centred communication while reinforcing the need for human oversight.

Ähnliche Arbeiten