Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Model Evolution and System Roles Influence the Performance of ChatGPT on Chinese Medical Licensing Exams: A Comparative Study (Preprint)

2023·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

<sec> <title>BACKGROUND</title> With the increasing application of Large Language Models (LLMs) like ChatGPT in various industries, its potential in the medical domain, especially in standardized examinations, has become a focal point of research. </sec> <sec> <title>OBJECTIVE</title> To assess the clinical performance of ChatGPT, focusing on its accuracy and reliability in the Chinese National Medical Licensing Examination (CNMLE). </sec> <sec> <title>METHODS</title> The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into 15 medical sub-specialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15, 2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompts designation of system roles tailored to medical sub-specialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The χ2 tests and Kappa values were employed to evaluate the model's accuracy and consistency. </sec> <sec> <title>RESULTS</title> GPT-4.0 achieved passing accuracy of (71.0% - 74.7%), significantly higher than that of GPT-3.5 (50.3% - 54.8%, P < 0.001). Both models showed relatively high coherence between initial and 2nd response, with Kappa values of 0.778 and 0.610. System roles boosted accuracy for both GPT-4.0 (0.3% - 3.7%) and GPT-3.5 (1.3% - 4.5%), and increased the Kappa by 0.023 and 0.035 respectively. In multi-specialty analysis, GPT-4.0 passed the threshold in 14 of 15 sub-specialties, while GPT-3.5 did so in 7 of 15 on the first response. </sec> <sec> <title>CONCLUSIONS</title> GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, and medical sub-specialty expertise. Adding a system role enhanced the model's reliability and answer coherence. GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study. </sec>

Autoren

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingCOVID-19 diagnosis using AI

Volltext beim Verlag öffnen

Model Evolution and System Roles Influence the Performance of ChatGPT on Chinese Medical Licensing Exams: A Comparative Study (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen