Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Outstanding performance of <scp>ChatGPT</scp> on the obstetrics and gynecology board certification examination in Japan: Document and image‐based questions analysis
1
Zitationen
4
Autoren
2024
Jahr
Abstract
ChatGPT is an artificial intelligence (AI) language model available online, trained on vast amounts of text to excel in natural language processing. The newer versions, like ChatGPT-4, can also interpret images and files, extending their usefulness across various fields.1, 2 Regarding whether ChatGPT can provide correct answers to medical examination questions, previous research has shown that it can achieve passing scores across various medical fields.3 However, its performance in obstetrics and gynecology remains unclear. Additionally, it is still unknown how ChatGPT performs on image-based questions that require the interpretation of imaging tests and physical findings. Herein, we aimed to investigate ChatGPT's performance on the obstetrics and gynecology board certification examination conducted in Japan, specifically focusing on document-based and image-based questions. The Japan Society of Obstetrics and Gynecology annually conducts obstetrics and gynecology board certification examinations in Japan. Eligibility for the examination is granted to those who have obtained a medical license, completed 2 years of junior residency, and underwent at least 3 years of training as obstetrician-gynecologists at a designated training facility. The examination covers four fields: perinatology, gynecologic oncology, reproductive endocrinology, and women's healthcare. It consists of approximately 120 multiple-choice questions. The exam includes not only document-based questions, but also image-based questions that require answers based on ultrasound, magnetic resonance imaging, computed tomography, pathological images, cardiotocogram evaluation, and clinical photographs. For multiple-choice questions, an answer was considered incorrect unless all choices selected were correct. ChatGPT-4 is an advanced version of OpenAI's conversational AI, which offers improved language understanding, image reading, and generation capabilities. It provides more accurate context-aware responses than its predecessors. In the obstetrics and gynecology board certification examination, ChatGPT-4 demonstrated high performance over the past 3 years, with an overall accuracy rate of 70.2%, 64.8%, 66.7%, and 77.3% in perinatology, gynecologic oncology, reproductive endocrinology, and women's healthcare, respectively, although the accuracy rate in actual examinations has not been disclosed. Additionally, ChatGPT-4 achieved accuracy rates with no significant differences in image-based questions, comparable to its performance on document-based questions (Table 1). These results suggest that ChatGPT has promising capabilities to accurately answer clinical questions for both document-based and image-based questions in obstetrics and gynecology. Further evaluation of whether these models can make accurate judgments in real-world clinical scenarios is essential; however, continued advancement of AI models can further enhance their value in medical situations. T.N. contributed to the conception, design, acquisition, analysis, and interpretation of data and wrote the manuscript. R.Y. contributed to the data analysis and interpretation and provided manuscript guidance. A.S and A.O. contributed to the data interpretation, provided manuscript guidance, and supervised the research. All authors have reviewed and approved the final manuscript and agreed to be accountable for all aspects of the study, ensuring its accuracy and integrity. The authors declare no conflict of interests for this article. Data available on request from the authors.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.