Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial intelligence‑driven virtual tumor board enhances precision care in myelodysplastic syndromes (MDS)
0
Zitationen
19
Autoren
2025
Jahr
Abstract
Abstract Background and Aims Myelodysplastic syndromes (MDS) present significant diagnostic and therapeutic challenges, often requiring input from multidisciplinary teams and subspecialty-trained leukemia and pathology experts. The complexity is compounded by evolving classification systems (e.g., WHO and ICC) and an expanding therapeutic landscape, which complicate real-time clinical decision-making. Although large language models (LLMs) such as ChatGPT have demonstrated promise in medical domains, they frequently yield inaccurate or overly generalized responses when applied to complex hematologic scenarios. Even state-of-the-art models—capable of human-like reasoning and sometimes outperforming clinicians in general tasks—have not been systematically evaluated in the context of advanced hematologic disorders such as MDS. To address this gap, we first assessed the performance of leading LLMs, including ChatGPT, Claude, and DeepSeek, on challenging, real-world MDS cases. After identifying their limitations, we built the Virtual MDS Panel (VMP), a coordinated AI system in which AI agents—task-bound software assistants that understand natural language and help users complete tasks, answer questions, and make decisions efficiently—are trained on domain knowledge (WHO/ICC; IPSS-R/IPSS-M; NCCN) and explicit decision rules to collaborate and produce tumor-board–level recommendations. Methods VMP comprises four specialized AI agents: a moderator agent that receives clinical queries; a pathology agent trained on WHO/ICC criteria; a prognostication agent using risk models (IPSS, IPSS-R, IPSS-M) and a therapy agent grounded in NCCN and ELN guidelines. For each case, a physician submits a clinical scenario to the moderator, which breaks down the query and delegates sub-tasks to relevant agents. The moderator then synthesizes their responses into a structured output. To evaluate performance, we created a test set of 30 complex, real-world MDS cases. VMP responses were compared to those from leading LLMs (ChatGPT-4o, GPT-o3, Claude, DeepSeek). Eleven international MDS experts, blinded to response sources, independently scored outputs for accuracy, clinical relevance, and completeness (Likert scale 1–5). They also assessed diagnostic reasoning, prognostic validity, and treatment recommendations, while classifying factual errors as none, minor, or major. To evaluate the consistency of expert ratings, we used the intraclass correlation coefficient (ICC) to measure how well experts agreed on numerical scores and Cohen's κ (kappa) to assess their agreement when identifying errors. Results VMP achieved an overall expert-rated accuracy of 93%, outperforming GPT-o3 (82%), GPT-4o (80%), DeepSeek (71%), and Claude (66%). When examining each domain, the VMP overall scored highest 4.2 (mean on scale from 1-5 among all experts): Diagnosis 4.3, Prognosis 4.4 and therapy selection 3.9, compared to GPT-o3, at 3.6 overall, 3.7 / 3.6 / 3.6, GPT-4o (3.2 overall) 3.1 / 3.2 / 3.4, DeepSeek (3.0) 2.9 / 3.0 / 3.1, and Claude (2.9) 2.7 / 2.9 / 3.1, respectively. According to expert opinion, the VMP registered just 9% major factual errors compared to GPT-o3 26%, GPT-4o 26%, DeepSeek 33%, and Claude 36%. Minor factual errors were present at 36% in VMP vs 47-52% in the other 4 models. Experts showed strong agreement in their evaluations, with high consistency in scoring (ICC = 0.81) and in identifying AI errors or hallucinations (κ = 0.76), confirming the reliability of the review process.Conclusions We developed an advanced, MDS-focused AI system that increased accuracy and alignment with expert practice, outperforming current state-of-the-art general AI models. By emulating a virtual tumor board, the system offers structured, evidence-based guidance that can aid hematologists in optimizing their precision care.
Ähnliche Arbeiten
The 2016 revision to the World Health Organization classification of myeloid neoplasms and acute leukemia
2016 · 10.063 Zit.
Human acute myeloid leukemia is organized as a hierarchy that originates from a primitive hematopoietic cell
1997 · 6.900 Zit.
Diagnosis and management of AML in adults: 2017 ELN recommendations from an international expert panel
2016 · 5.784 Zit.
Proposals for the Classification of the Acute Leukaemias F<scp>rench</scp>‐A<scp>merican</scp>‐B<scp>ritish</scp> (FAB) C<scp>o‐operative</scp> G<scp>roup</scp>
1976 · 5.587 Zit.
Genomic and Epigenomic Landscapes of Adult De Novo Acute Myeloid Leukemia
2013 · 5.080 Zit.
Autoren
- David M. Swoboda
- Amy E. DeZern
- James T. England
- Sangeetha Venugopal
- Thomas J. Kehoe
- Brandon J. Aubrey
- Marco Gabriele Raddi
- Angela Consagra
- Jia-Sheng Wang
- Gustavo Rivero
- Maximilian Stahl
- Amer M. Zeidan
- Torsten Haferlach
- Andrew M. Brunner
- Rena Buckstein
- Valeria Santini
- Matteo Giovanni Della Porta
- Mikkael A. Sekeres
- Aziz Nazha
Institutionen
- Tampa General Hospital(US)
- Sidney Kimmel Comprehensive Cancer Center(US)
- Health Sciences Centre(CA)
- Sunnybrook Health Science Centre(CA)
- Sylvester Comprehensive Cancer Center(US)
- University of South Florida(US)
- Massachusetts General Hospital(US)
- University of Florence(IT)
- The Ohio State University(US)
- Yale University(US)
- Munich Leukemia Laboratory (Germany)(DE)
- IRCCS Humanitas Research Hospital(IT)
- Sidney Kimmel Cancer Center(US)