Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Decoding AI Competence: Benchmarking Large Language Models (LLMs) in Ovarian Cancer Diagnosis and Treatment—A Systematic Evaluation of Generative AI Accuracy and Completeness
0
Zitationen
8
Autoren
2026
Jahr
Abstract
<b>Objective</b>: To evaluate the practical value of DeepSeek-R1 and Doubao-1.5-pro in the context of ovarian cancer management by examining their diagnostic and treatment-related competencies. <b>Methods</b>: 20 key ovarian cancer diagnosis and treatment issues were identified, divided into 4 domains with 5 questions each. Two large language models answered these questions, and 5 gynecologic oncology chief physicians evaluated the answers on a 1-10 scale for completeness and accuracy. For each score and the mean score for each question, if it surpassed 7, it is evaluated as "Excellent." The Kruskal-Wallis test compared scores within each LLM across 4 categories, and the Mann-Whitney-Wilcoxon test compared scores between the two LLMs in each category. <b>Results</b>: 200 scores were collected (100 per model). DeepSeek-R1 got 98 "Excellent" ratings, while Doubao-1.5-pro got 41. All 20 DeepSeek-R1 responses had "Excellent" average scores, compared to 9 for Doubao-1.5-pro. DeepSeek-R1 had less variability. Tests revealed significant differences between the models and showed DeepSeek-R1 outperformed Doubao-1.5-pro, and charts showed Doubao-1.5-pro scored lower in all aspects, especially "Medical". <b>Conclusions</b>: DeepSeek-R1 shows potential in ovarian cancer diagnosis and treatment but has limitations like inaccuracies and overly technical responses due to outdated data and lack of humanistic elements. LLMs like DeepSeek-R1 are useful for medical education and assistive diagnosis, but they require ongoing updates and refinement for broader clinical use. Selecting the appropriate LLM for medical tasks and improving their clarity and accuracy is crucial for their future effectiveness.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.