Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
<scp>GynMedEval</scp> : A comprehensive dataset for evaluating the diagnostic capability of large language models in gynecology
0
Zitationen
7
Autoren
2025
Jahr
Abstract
In recent years, large language models (LLMs) have achieved significant breakthroughs across various fields, demonstrating immense potential in the medical domain. However, existing studies fall short in evaluating the diagnostic capabilities of LLMs in complex clinical cases. To address this gap, we have developed GynMedEval, a comprehensive dataset designed to assess the performance of LLMs in gynecologic disease diagnosis. This dataset is sourced from real-world cases and the Chinese Clinical Case Outcomes Database, comprising 515 samples reviewed by physicians with senior titles (Associate Chief Physician or above). Each sample includes over 50 pieces of patient physiological characteristics and laboratory test data. We transformed the samples into a multiple-choice format for evaluation. Several state-of-the-art LLMs were assessed on the dataset under various diagnostic scenarios, including zero-shot and few-shot settings. The results revealed significant strengths in diagnosing common conditions, but none of the models achieved an accuracy rate above 90%. The establishment of the GynMedEval dataset addresses a critical gap in the evaluation of LLMs for gynecologic diagnosis. It will enable a deeper analysis of these models' performance, fostering their application in healthcare to enhance diagnostic accuracy, improve patient privacy, and ensure greater convenience.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.