Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Decoding AI Competence: Benchmarking Large Language Models (LLMs) in Ovarian Cancer Diagnosis and Treatment—A Systematic Evaluation of Generative AI Accuracy and Completeness

2026·0 Zitationen·DiagnosticsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Objective: To evaluate the practical value of DeepSeek-R1 and Doubao-1.5-pro in the context of ovarian cancer management by examining their diagnostic and treatment-related competencies. Methods: 20 key ovarian cancer diagnosis and treatment issues were identified, divided into 4 domains with 5 questions each. Two large language models answered these questions, and 5 gynecologic oncology chief physicians evaluated the answers on a 1-10 scale for completeness and accuracy. For each score and the mean score for each question, if it surpassed 7, it is evaluated as "Excellent." The Kruskal-Wallis test compared scores within each LLM across 4 categories, and the Mann-Whitney-Wilcoxon test compared scores between the two LLMs in each category. Results: 200 scores were collected (100 per model). DeepSeek-R1 got 98 "Excellent" ratings, while Doubao-1.5-pro got 41. All 20 DeepSeek-R1 responses had "Excellent" average scores, compared to 9 for Doubao-1.5-pro. DeepSeek-R1 had less variability. Tests revealed significant differences between the models and showed DeepSeek-R1 outperformed Doubao-1.5-pro, and charts showed Doubao-1.5-pro scored lower in all aspects, especially "Medical". Conclusions: DeepSeek-R1 shows potential in ovarian cancer diagnosis and treatment but has limitations like inaccuracies and overly technical responses due to outdated data and lack of humanistic elements. LLMs like DeepSeek-R1 are useful for medical education and assistive diagnosis, but they require ongoing updates and refinement for broader clinical use. Selecting the appropriate LLM for medical tasks and improving their clarity and accuracy is crucial for their future effectiveness.

Autoren

Institutionen

Shanghai First Maternity and Infant Hospital(CN)

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)AI in cancer detection

Volltext beim Verlag öffnen

Decoding AI Competence: Benchmarking Large Language Models (LLMs) in Ovarian Cancer Diagnosis and Treatment—A Systematic Evaluation of Generative AI Accuracy and Completeness

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen