Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ATheNa-Breast: A real-world pilot of an artificial intelligence (AI) chatbot using therapy guidelines to navigate the growing complexity in breast cancer.
0
Zitationen
8
Autoren
2025
Jahr
Abstract
e13623 Background: The increasing complexity in the treatment of breast cancer stems from both the rapid pace of approvals of new treatments and discovery of novel biomarkers. This is reflected in the need for frequent updates of medical guidelines to ensure an optimal standard of care for patients. AI and specifically Large Language Models (LLMs) hold promise to assist doctors navigate this complexity. Methods: We developed ATheNa-Breast, a chatbot to provide medical assistance by combining a LLM (GPT-4o-2024-08-06) with guideline documents from the AGO Breast Committee Germany (AGO) and the European Society of Medical Oncology (ESMO) through Retrieval Augmented Generation (RAG). To assess its ability to provide guideline-concordant responses to physicians, we created ATheNaBench, a novel dataset of 84 tasks, including multiple-choice and free-text questions and longitudinal medical cases. Our evaluations compare the average accuracy on ATheNaBench of medical doctors (HCP) with access to ATheNa-Breast (ATheNa group, n = 8) versus without access to ATheNa-Breast (control, n = 8), ATheNa-Breast alone (n = 10*), GPT-4o alone (n = 5*) and GPT-4o with web search alone (n = 5*). Results: Our results show that ATheNa-Breast alone [ATheNaBench score (ABs), 79.5%] outperforms other state-of-the-art AI models (ABs, 66.7%), even with integrated web search (ABs, 73.8 %). It achieves a higher guideline-concordance than both HCP groups: clinicians with (ABs, 70.5%) and clinicians without (ABs, 70.2%) access to ATheNa-Breast as a decision support tool. The ATheNa group and control group did not show a difference in guideline concordance. In a side-by-side comparison, access to Athena-Breast helps to reduce the time required by HCPs to answer questions and complete the medical cases by 15 % (132 minutes in the ATheNa group versus 156 minutes in the control group). Moreover, while 75 % of HCPs subjectively reported considerably saving time when using ATheNa-Breast to make therapy decisions, none indicated they would feel confident relying on the AI to decide independently. Conclusions: ATheNa-Breast alone demonstrated superior adherence to clinical guidelines compared to all other groups, including HCPs using it as a decision support tool. Access to ATheNa-Breast did not improve the accuracy of physicians’ decision but saved time. This finding contradicts the lack of trust by physicians in AI making therapy decisions autonomously. The gap between AI and HCPs might highlight untapped potential to enhance guideline-concordant care by encouraging HCPs to collaborate with AI more effectively, while preserving their lead in decision-making. Overview Results. GPT4o ATheNa-Breast GPT4o + Web Search HCPs + ATheNa HCPs N=5* N=10* N=5* N=8 N=8 Performance (ABs) 66.7% 79.5% 73.8% 70.2% 70.5% Time 132 min 156 min *Indicates number of independent repetitions.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.