Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

<scp>GynMedEval</scp> : A comprehensive dataset for evaluating the diagnostic capability of large language models in gynecology

2025·0 Zitationen·International Journal of Gynecology & ObstetricsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

In recent years, large language models (LLMs) have achieved significant breakthroughs across various fields, demonstrating immense potential in the medical domain. However, existing studies fall short in evaluating the diagnostic capabilities of LLMs in complex clinical cases. To address this gap, we have developed GynMedEval, a comprehensive dataset designed to assess the performance of LLMs in gynecologic disease diagnosis. This dataset is sourced from real-world cases and the Chinese Clinical Case Outcomes Database, comprising 515 samples reviewed by physicians with senior titles (Associate Chief Physician or above). Each sample includes over 50 pieces of patient physiological characteristics and laboratory test data. We transformed the samples into a multiple-choice format for evaluation. Several state-of-the-art LLMs were assessed on the dataset under various diagnostic scenarios, including zero-shot and few-shot settings. The results revealed significant strengths in diagnosing common conditions, but none of the models achieved an accuracy rate above 90%. The establishment of the GynMedEval dataset addresses a critical gap in the evaluation of LLMs for gynecologic diagnosis. It will enable a deeper analysis of these models' performance, fostering their application in healthcare to enhance diagnostic accuracy, improve patient privacy, and ensure greater convenience.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingMachine Learning in Healthcare

Volltext beim Verlag öffnen

<scp>GynMedEval</scp> : A comprehensive dataset for evaluating the diagnostic capability of large language models in gynecology

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen