Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
RETRACTED: Artificial Intelligence on the Exam Table: ChatGPT's Advancement in Urology Self-assessment
4
Zitationen
5
Autoren
2023
Jahr
Abstract
You have accessUrology PracticeEditorials1 Nov 2023Artificial Intelligence on the Exam Table: ChatGPT's Advancement in Urology Self-assessment Angelo Cadiente, Jamie Chen, Jennifer Nguyen, Hossein Sadeghi-Nejad, and Mubashir Billah Angelo CadienteAngelo Cadiente *Correspondence: Hackensack Meridian School of Medicine, 123 Metro Blvd, Nutley, NJ 07110 telephone: 201-487-8866; E-mail Address: [email protected] https://orcid.org/0009-0007-9933-2804 Hackensack Meridian School of Medicine, Nutley, New Jersey , Jamie ChenJamie Chen https://orcid.org/0000-0003-0572-2291 Hackensack Meridian School of Medicine, Nutley, New Jersey , Jennifer NguyenJennifer Nguyen Department of Urology, Hackensack University Medical Center, Hackensack, New Jersey , Hossein Sadeghi-NejadHossein Sadeghi-Nejad Department of Urology, NYU Grossman School of Medicine, New York, New York , and Mubashir BillahMubashir Billah Hackensack Meridian School of Medicine, Nutley, New Jersey Department of Urology, Hackensack University Medical Center, Hackensack, New Jersey View All Author Informationhttps://doi.org/10.1097/UPJ.0000000000000446AboutPDF ToolsAdd to favoritesDownload CitationsTrack CitationsPermissionsReprints ShareFacebookTwitterLinked InEmail ChatGPT-3.5's underperformance on the 2022 Self-Assessment Study Program (SASP) examination presents an ideal time to address the progress of artificial intelligence (AI) in urology.1 Previously, we had high hopes that it could be used in the clinical support setting and, to our excitement, it has excelled.2,3 Gabrielson et al mused that the use of AI in the field is "limited only by our imagination."4 In the interest of progress, the large language model (LLM) was then assessed by Huynh et al in its urological education capabilities, where it had a poor showing of 28.2% in the SASP.1 This unimpressive performance posed the question: are we overestimating the capabilities of LLMs or are we merely witnessing the growing pains of a developing technology? Perhaps we can boast that urology is more complex than other disciplines, leading to specialty-specific or even topic-specific competencies of the LLM! While we partially touch upon this nuance, this editorial serves to review the landscape of AI in urology and offer findings to support our optimism in its progression. LLMs have performed well in other disciplines' examinations, hence our surprise when comparing ChatGPT's performance in the SASP to the developing body of medical literature in other specialties. The accuracy of 28.2% falls short of its performance in the 2021 and 2022 American College of Gastroenterology self-assessment examinations (68.3% in ChatGPT-3.0), the Dermatology Specialty Certificate examination (63% in ChatGPT-3.5), and the Ophthalmic Knowledge Assessment Program examination (46.4% in ChatGPT-3.5; see Figure).5-7 Its propensity to err is attributed to a slew of causes, including its potential to generate false information (hallucinate), the knowledge cutoff of September 2021, the lack of indexed information, or the inability to adequately contextualize an inquiry.5-7 These shortcomings have been documented in the literature, yet have shown improvement upon each successive version of ChatGPT, especially between 3.5 and 4.0.5-7 Additionally, ChatGPT has demonstrated further use outside the realm of medical education, including in clinical support settings in writing patient discharge summaries and responding to patient portal messages.2,3 Figure. ChatGPT-3.0, -3.5, and -4.0 performance on standardized urology, ophthalmology, gastroenterology, and dermatology specialty exams. When observing ChatGPT's struggles in urology in comparison with the relative successes seen in other fields, it is important to consider possible root causes. Is the indexed urological information limited? Is the learning curve of the field steeper? Or perhaps human error is the cause and we are using LLMs incorrectly? ChatGPT's performance has been stagnant in urology. On the 2022 SASP, ChatGPT-3.0 scored a 30.0%, which is similar to the successive ChatGPT-3.5's 28.2%.8 While the ChatGPT training data sets remain undisclosed, the use of the 2022 SASP implies that the knowledge cutoff is met and the question set is not indexed. In short, these studies are evaluating ChatGPT's capacity to answer varying degrees of urological questions using its current training. The consistency of these scores demonstrates that the urological training between ChatGPT-3.0 and ChatGPT-3.5 is similar. Recognizing the rapidly progressing nature of LLMs during the AI boom, we reran the study with ChatGPT-4, the latest version of ChatGPT boasting increased processing power and improved contextual understanding, while maintaining the same data set timeline as its predecessor. Employing the May 24 version of ChatGPT-4, we followed the same methodology as the Huynh et al study for multiple-choice questions, excluding 18 that were associated with figures on the 2022 SASP.1 We found that ChatGPT-4 achieved an accuracy of 49.2%, correctly answering 65 of 132 questions. Compared with Huynh et al's original study, ChatGPT-4 scored 21.0% higher with 27 more correct answers.1 Although this score falls 1 question short of obtaining continuing medical education credit with the requisite score of 50%, ChatGPT-4 showcases major advancement for LLMs in the context of medical examinations.9 Thus, ChatGPT in urology is unmistakably evolving. These findings are especially important when contrasted to the American College of Gastroenterology self-assessment examinations, where ChatGPT-4 not only failed, but also scored worse than its predecessor.5 By comparing ChatGPT-4's accuracy to Huynh et al's study of ChatGPT-3.5 using the same question set, it can be concluded that either the medically indexed information in urology has increased or the new model's refined contextual understanding is enabling it to use the indexed information more effectively.1 The relative improvement is likely a combination of both possibilities. Additionally, the quality of ChatGPT-4's explanations suggest an enhanced comprehension of the test writer's objectives behind each inquiry, aptly identifying the intended subject matter irrespective of its complexity. While ChatGPT is making significant strides toward potential use in urological education, further progress remains to be made to improve accuracy and decrease the risk of disseminating medical misinformation among urological trainees. However, given the observed improvement across subsequent versions, we can view these challenges not as stumbling blocks but as stepping stones to increased application of these models in urology. With continued refinements, LLMs are closing in on their potential to revolutionize medicine, and the field of urology is in a prime position to greatly benefit from the advances. References 1. . New artificial intelligence ChatGPT performs poorly on the 2022 self-assessment study program for urology. Urol Pract. 2023; 10(4):409-415. Link, Google Scholar 2. . ChatGPT: the future of discharge summaries?. Lancet Digital Health. 2023; 5(3):e107-e108. Crossref, Medline, Google Scholar 3. . Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023; 183(6):589-596. Crossref, Medline, Google Scholar 4. . Harnessing generative artificial intelligence to improve efficiency among urologists: welcome ChatGPT. J Urol. 2023; 209(5):827-829. Link, Google Scholar 5. . Chat generative pretrained transformer fails the multiple-choice American College of Gastroenterology self-assessment test. Am J Gastroenterol. 2023; 10.14309/ajg.0000000000002320. Crossref, Medline, Google Scholar 6. . Performance of ChatGPT on dermatology specialty certificate examination multiple choice questions. Clin Exp Dermatol. 2023:llad197. Crossref, Medline, Google Scholar 7. . Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023; 141(6):589. Crossref, Medline, Google Scholar 8. . ChatGPT performance on the American Urological Association (AUA) Self-Assessment Study Program and the potential influence of artificial intelligence (AI) in urologic training. Urology. 2023; 177:29-33. Crossref, Medline, Google Scholar 9. American Urological Association. Self-Assessment Study Program Frequently Asked Questions. Accessed June 23, 2023. https://www.auanet.org/about-us/sasp-test-page-js/self-assessment-study-program-frequently-asked-questions Google Scholar Support: None. Conflict of Interest Disclosures: The Authors have no conflicts of interest to disclose. Ethics Statement: This study was deemed exempt from Institutional Review Board review. Author Contributions: Conception and design: AC, JC, HS-N, MB; Data analysis and interpretation: AC, JC, JN, HS-N; Critical revision of the manuscript for scientific and factual content: AC, JC, JN, HS-N, MB; Drafting the manuscript: AC, JC; Statistical analysis: AC, JC; Supervision: JC, JN, HS-N, MB. Data Availability: The data that support the findings of this study are available from the American Urological Association's Self-Assessment Study Program and its licensors but restrictions apply to the availability of these data. Access to the question sets could be acquired through purchasing the course from the company: https://auau.auanet.org/content/self-assessment-study-program-sasp-2022. © 2023 by American Urological Association Education and Research, Inc.FiguresReferencesRelatedDetails Volume 10Issue 6November 2023Page: 521-523 Advertisement Copyright & Permissions© 2023 by American Urological Association Education and Research, Inc.Keywordseducation, medical, continuingurologyartificial intelligencenatural language processingMetrics Author Information Angelo Cadiente Hackensack Meridian School of Medicine, Nutley, New Jersey *Correspondence: Hackensack Meridian School of Medicine, 123 Metro Blvd, Nutley, NJ 07110 telephone: 201-487-8866; E-mail Address: [email protected] More articles by this author Jamie Chen Hackensack Meridian School of Medicine, Nutley, New Jersey More articles by this author Jennifer Nguyen Department of Urology, Hackensack University Medical Center, Hackensack, New Jersey More articles by this author Hossein Sadeghi-Nejad Department of Urology, NYU Grossman School of Medicine, New York, New York More articles by this author Mubashir Billah Hackensack Meridian School of Medicine, Nutley, New Jersey Department of Urology, Hackensack University Medical Center, Hackensack, New Jersey More articles by this author Expand All Support: None. Conflict of Interest Disclosures: The Authors have no conflicts of interest to disclose. Ethics Statement: This study was deemed exempt from Institutional Review Board review. Author Contributions: Conception and design: AC, JC, HS-N, MB; Data analysis and interpretation: AC, JC, JN, HS-N; Critical revision of the manuscript for scientific and factual content: AC, JC, JN, HS-N, MB; Drafting the manuscript: AC, JC; Statistical analysis: AC, JC; Supervision: JC, JN, HS-N, MB. Data Availability: The data that support the findings of this study are available from the American Urological Association's Self-Assessment Study Program and its licensors but restrictions apply to the availability of these data. Access to the question sets could be acquired through purchasing the course from the company: https://auau.auanet.org/content/self-assessment-study-program-sasp-2022. Advertisement Advertisement PDF downloadLoading ...
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.