Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study (Preprint)
0
Zitationen
7
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> Evaluating the accuracy and educational utility of artificial intelligence–generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored. </sec> <sec> <title>OBJECTIVE</title> This study aimed to assess the educational utility of ChatGPT-4–generated clinical vignettes and their applicability in educational settings. </sec> <sec> <title>METHODS</title> Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney <i>U</i> tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians’ experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases. </sec> <sec> <title>RESULTS</title> Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (<i>P</i>&lt;.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations. </sec> <sec> <title>CONCLUSIONS</title> ChatGPT-4–generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4’s value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.292 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.143 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.539 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.452 Zit.