OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 10:52

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A protocol for evaluating the accuracy and reliability of large language models in answering patient questions on breast reconstruction surgery

2025·1 Zitationen·Journal of Surgical Protocols and Research MethodologiesOpen Access
Volltext beim Verlag öffnen

1

Zitationen

7

Autoren

2025

Jahr

Abstract

Abstract Background Large language models (LLMs) are increasingly used in healthcare settings to provide patient education and answer medical inquiries. However, their reliability in delivering accurate, clear and unbiased information remains uncertain. This study aims to evaluate the quality of responses generated by LLMs to common patient questions regarding breast reconstruction surgery. Methods A total of 60 patient-oriented questions related to breast reconstruction will be selected from professional bodies, patient support groups and social media platforms. These questions will be categorized into six main topics: fundamental knowledge, preoperative considerations, surgical procedures, procedural risks and postoperative complications, preparation and recovery and miscellaneous concerns. Seven LLMs (ChatGPT 4o, Claude, Copilot, DeepSeek, Gemini, Grok and OpenEvidence) will be tested by inputting each question twice using the ‘New Chat’ feature to assess response consistency. Responses will be evaluated by 10 board-certified plastic surgeons using a structured scoring rubric covering five criteria: accuracy, clarity & appropriateness, completeness, and user engagement & reassurance. A three-point scoring system will be employed, with penalty deductions for missing or misleading information. Inter-rater reliability will be measured to ensure consistency among evaluators. Discussion By systematically assessing the responses of multiple LLMs to patient inquiries on breast reconstruction, this study will provide insights into their reliability and clinical applicability. Findings may help refine LLM-based tools for patient education and identify areas requiring improvement to ensure safe and effective AI-assisted communication in plastic surgery.

Ähnliche Arbeiten