Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing ChatGPT-4's Proficiency in English College Entrance Examinations Using Web Raschonline: A Comparative Study (Preprint)
0
Zitationen
3
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> ChatGPT, developed by OpenAI, is a state-of-the-art large language model that has demonstrated exceptional performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence (AI), there is a lack of comprehensive studies assessing ChatGPT's competence, particularly ChatGPT-4V for images and text, on English tests for college entrance examinations using Rasch analysis. </sec> <sec> <title>OBJECTIVE</title> This study aims to (1) showcase the utility of the Rasch analysis website, RaschOnline, and (2) determine the grade achieved by ChatGPT-4V by evaluating its ability to handle questions involving both images and text, in comparison to a typical sample </sec> <sec> <title>METHODS</title> ChatGPT's capability was evaluated using 46 multiple-choice questions (MCQs) from the English tests conducted for Taiwan's college entrance examinations in 2024. Using a Rasch model, 300 simulated students with normal distributions were generated to compare with ChatGPT's responses. RaschOnline was used to create six visual presentations: item-difficulty plot, differential item functioning (DIF), item characteristic curve (ICC), performance plot, Wright map, and KIDMAP, to address the research objectives. </sec> <sec> <title>RESULTS</title> The findings revealed the following: (1) ChatGPT-4V's capacity was shown on the Wright Map, with 4.05 logits and a correct response rate of 0.98, resulting from the item 35 incorrectly answered during the visual identification of a comprehensive reading; (2) Item 35 was the most difficult (correct response rate=22%=67/301, with 1.59 logits on the item-difficulty plot); (3)An aberrant response with Zscore=-3.42 (<2.0, P<.05) on item 35 for ChatGPT-4V was shown on KIDMAP; (4) No DIF or capacity differences were observed between the two groups (P>.05 and P=.454, respectively), but significant in strata; (5) ChatGPT-4V uniquely classified as Grade A, standing out at the top on the performance plot when compared to other counterparts; (6) Item 35 displayed an good fit to the Rasch model with the infit mean square error (MNSQ) of 0.99. </sec> <sec> <title>CONCLUSIONS</title> Using RaschOnline, this study provides evidence that ChatGPT-4V possesses the ability to achieve a grade A when compared to a typical sample. After discussing the correct answer to item 35 with ChatGPT-4V, a 100% correct response rate was then achieved by either ChatGPT-4V or ChatGPT 4.0. ChatGPT-4V's excellent proficiency in answering MCQs with images and text from the English tests conducted in 2024 for Taiwan's college entrance examinations was evident. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.102 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.468 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.