Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A protocol for evaluating the accuracy and reliability of large language models in answering patient questions on breast reconstruction surgery
1
Zitationen
7
Autoren
2025
Jahr
Abstract
Abstract Background Large language models (LLMs) are increasingly used in healthcare settings to provide patient education and answer medical inquiries. However, their reliability in delivering accurate, clear and unbiased information remains uncertain. This study aims to evaluate the quality of responses generated by LLMs to common patient questions regarding breast reconstruction surgery. Methods A total of 60 patient-oriented questions related to breast reconstruction will be selected from professional bodies, patient support groups and social media platforms. These questions will be categorized into six main topics: fundamental knowledge, preoperative considerations, surgical procedures, procedural risks and postoperative complications, preparation and recovery and miscellaneous concerns. Seven LLMs (ChatGPT 4o, Claude, Copilot, DeepSeek, Gemini, Grok and OpenEvidence) will be tested by inputting each question twice using the ‘New Chat’ feature to assess response consistency. Responses will be evaluated by 10 board-certified plastic surgeons using a structured scoring rubric covering five criteria: accuracy, clarity & appropriateness, completeness, and user engagement & reassurance. A three-point scoring system will be employed, with penalty deductions for missing or misleading information. Inter-rater reliability will be measured to ensure consistency among evaluators. Discussion By systematically assessing the responses of multiple LLMs to patient inquiries on breast reconstruction, this study will provide insights into their reliability and clinical applicability. Findings may help refine LLM-based tools for patient education and identify areas requiring improvement to ensure safe and effective AI-assisted communication in plastic surgery.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.