OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.03.2026, 11:19

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing ChatGPT’s Capability for Multiple Choice Questions Using RaschOnline: Observational Study (Preprint)

2023·1 ZitationenOpen Access
Volltext beim Verlag öffnen

1

Zitationen

4

Autoren

2023

Jahr

Abstract

<sec> <title>BACKGROUND</title> ChatGPT (OpenAI), a state-of-the-art large language model, has exhibited remarkable performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence, there is a scarcity of studies that assess ChatGPT’s competence in addressing multiple-choice questions (MCQs) using KIDMAP of Rasch analysis—a website tool used to evaluate ChatGPT’s performance in MCQ answering. </sec> <sec> <title>OBJECTIVE</title> This study aims to (1) showcase the utility of the website (Rasch analysis, specifically RaschOnline), and (2) determine the grade achieved by ChatGPT when compared to a normal sample. </sec> <sec> <title>METHODS</title> The capability of ChatGPT was evaluated using 10 items from the English tests conducted for Taiwan college entrance examinations in 2023. Under a Rasch model, 300 simulated students with normal distributions were simulated to compete with ChatGPT’s responses. RaschOnline was used to generate 5 visual presentations, including item difficulties, differential item functioning, item characteristic curve, Wright map, and KIDMAP, to address the research objectives. </sec> <sec> <title>RESULTS</title> The findings revealed the following: (1) the difficulty of the 10 items increased in a monotonous pattern from easier to harder, represented by logits (–2.43, –1.78, –1.48, –0.64, –0.1, 0.33, 0.59, 1.34, 1.7, and 2.47); (2) evidence of differential item functioning was observed between gender groups for item 5 (&lt;i&gt;P&lt;/i&gt;=.04); (3) item 5 displayed a good fit to the Rasch model (&lt;i&gt;P&lt;/i&gt;=.61); (4) all items demonstrated a satisfactory fit to the Rasch model, indicated by Infit mean square errors below the threshold of 1.5; (5) no significant difference was found in the measures obtained between gender groups (&lt;i&gt;P&lt;/i&gt;=.83); (6) a significant difference was observed among ability grades (&lt;i&gt;P&lt;/i&gt;&amp;lt;.001); and (7) ChatGPT’s capability was graded as A, surpassing grades B to E. </sec> <sec> <title>CONCLUSIONS</title> By using RaschOnline, this study provides evidence that ChatGPT possesses the ability to achieve a grade A when compared to a normal sample. It exhibits excellent proficiency in answering MCQs from the English tests conducted in 2023 for the Taiwan college entrance examinations. </sec>

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingText Readability and Simplification
Volltext beim Verlag öffnen