Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparison of artificial intelligence to expert physician assessments of real-world oncology cases.
0
Zitationen
7
Autoren
2025
Jahr
Abstract
1564 Background: Artificial intelligence (AI) is increasingly being incorporated into the oncology field as a tool to support clinical decisions. AI tools such as ChatgGPT or OpenEvidence provide responses to user-generated queries, whereas some institutions or companies such as Primum, Inc, offer consultations with actual experts who provide personalized responses to clinician-submitted real-world cases. However, the value of AI tools to augment expert consultations continues to evolve. We report results of a study comparing AI versus expert oncologists' responses to 107 real-world hematology/oncology cases. Methods: Among 107 cases, inquiries included lymphomas (30), myeloma (24), leukemias (11), myeloid disorders (10), as well as classical hematology (32), assessed among 20 experts. Responses to de-identified cases submitted by practicing clinicians to Primum (www.primum.co) between June 2022-July 2023 were compared to GPT-4 responses (openai.com/chatgpt). The instructional prompt to GPT-4 was, "You are an expert oncologist conversing with another oncologist as a peer. You prefer to rely on guidelines and data published in reputable medical journals when responding.” Five expert faculty at our institution adjudicated the blinded comparative responses, including their preference, quality and practical value scores, and prediction of which response was AI generated. Comparison of scores was by t-test to generate P-values between expert and AI groups, and Pearson correlation was used for comparisons between adjudication scores. Results: Expert responses were preferred by > 50% of adjudicators in 75% of cases (deviation ±25%). Randomized AI responses were correctly identified 90% of the time. Mean expert vs AI scores (Likert scale 0-4) for quality (2.0 vs 2.1, P = 0.9) and practical value (2.1 vs 2.1, P = 0.9) were equivalent. Interestingly, AI responses were preferred in 46% (n = 15) of classical hematology and 31% (n = 9) of lymphoma cases, largely due to being more concise. However there was no concordance between high practical value scores and disease subtype for either group. Conclusions: : Expert physician responses were preferred over AI responses for most of the cases based on the level of detail presented, suggesting an implicit value of personalized responses compared to AI. Results showed no significant differences in quality or practical utility between AI generated responses and those from experts, reflecting a similarity in the information extracted from standardized guidelines, and potentially adding value of AI in supporting clinical decision making. Our findings are limited by the broad coverage of hematologic conditions for which experts and guidelines vary. Overall, these data suggest that while AI can supplement knowledge of management paradigms by providing basic management strategies, at present it cannot replace personalized expert consultation in clinical practice.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.