OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.03.2026, 11:19

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing ChatGPT-4's Proficiency in English College Entrance Examinations Using Web Raschonline: A Comparative Study (Preprint)

2024·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2024

Jahr

Abstract

<sec> <title>BACKGROUND</title> ChatGPT, developed by OpenAI, is a state-of-the-art large language model that has demonstrated exceptional performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence (AI), there is a lack of comprehensive studies assessing ChatGPT's competence, particularly ChatGPT-4V for images and text, on English tests for college entrance examinations using Rasch analysis. </sec> <sec> <title>OBJECTIVE</title> This study aims to (1) showcase the utility of the Rasch analysis website, RaschOnline, and (2) determine the grade achieved by ChatGPT-4V by evaluating its ability to handle questions involving both images and text, in comparison to a typical sample </sec> <sec> <title>METHODS</title> ChatGPT's capability was evaluated using 46 multiple-choice questions (MCQs) from the English tests conducted for Taiwan's college entrance examinations in 2024. Using a Rasch model, 300 simulated students with normal distributions were generated to compare with ChatGPT's responses. RaschOnline was used to create six visual presentations: item-difficulty plot, differential item functioning (DIF), item characteristic curve (ICC), performance plot, Wright map, and KIDMAP, to address the research objectives. </sec> <sec> <title>RESULTS</title> The findings revealed the following: (1) ChatGPT-4V's capacity was shown on the Wright Map, with 4.05 logits and a correct response rate of 0.98, resulting from the item 35 incorrectly answered during the visual identification of a comprehensive reading; (2) Item 35 was the most difficult (correct response rate=22%=67/301, with 1.59 logits on the item-difficulty plot); (3)An aberrant response with Zscore=-3.42 (&lt;2.0, P&lt;.05) on item 35 for ChatGPT-4V was shown on KIDMAP; (4) No DIF or capacity differences were observed between the two groups (P&gt;.05 and P=.454, respectively), but significant in strata; (5) ChatGPT-4V uniquely classified as Grade A, standing out at the top on the performance plot when compared to other counterparts; (6) Item 35 displayed an good fit to the Rasch model with the infit mean square error (MNSQ) of 0.99. </sec> <sec> <title>CONCLUSIONS</title> Using RaschOnline, this study provides evidence that ChatGPT-4V possesses the ability to achieve a grade A when compared to a typical sample. After discussing the correct answer to item 35 with ChatGPT-4V, a 100% correct response rate was then achieved by either ChatGPT-4V or ChatGPT 4.0. ChatGPT-4V's excellent proficiency in answering MCQs with images and text from the English tests conducted in 2024 for Taiwan's college entrance examinations was evident. </sec>

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AI
Volltext beim Verlag öffnen