OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 14.03.2026, 08:08

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accuracy of Artificial Intelligence Detection Software for Residency Personal Statements

2025·0 Zitationen·Journal of Graduate Medical EducationOpen Access
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2025

Jahr

Abstract

<b>Background</b> The improvement of generative artificial intelligence (AI) has led to concerns about residency applicants using AI to write personal statements. Because some program directors may value fully human-generated personal statements, they may be inclined to use commercially available AI detection tools. However, the accuracy of AI detection in personal statements is uncertain. <b>Objective</b> To evaluate the accuracy of AI detection tools in identifying AI-generated content within residency personal statements. <b>Methods</b> In 2024, 25 human-generated personal statements were collected from residents in the fields of internal medicine, psychiatry, neurology, and surgery at a single institution. The authors made 25 AI-generated statements with ChatGPT-4o, and 25 personal statements that were a combination of AI-generated and human-generated content (mixed content). Four AI detection tools (including free and paid tools) were used to compare the likelihood each statement was AI-generated. Summary statistics and multivariate analysis of variance (MANOVA) with post hoc Tukey test were performed. <b>Results</b> AI detection tools varied in the likelihood scores that were assigned to human-generated personal statements (mean likelihood of statement being AI generated, min-max range of likelihoods): non-disclosed paid detector (9.7%, min-max: 0-84%), Writer (1.6%, min-max: 0-9%), GPTzero (4.5%, min-max: 3-22%), and ZeroGPT (17.2%, min-max: 0-70.5%). MANOVA and post hoc tests revealed significant differences in likelihoods between the groups (<i>P</i><.001). However, there was overlap between mixed content and completely AI-generated personal statements. <b>Conclusions</b> The detection tools occasionally assigned high AI likelihood scores to human-generated content and were unable to reliably distinguish mixed-content texts from AI-generated texts.

Ähnliche Arbeiten