Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Ability of AI detection tools and humans to accurately identify different forms of AI-generated written content

2025·0 Zitationen·Advances in SimulationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

BACKGROUND: The increasing use of artificial intelligence (AI) by scholars presents a pressing challenge to healthcare publishing. While legitimate use can potentially accelerate scholarship, unethical approaches also exist, leading to factually inaccurate and biased text that may degrade scholarship. Numerous online AI detection tools exist that provide a percentage score of AI use. These can assist authors and editors in navigating this landscape. In this study, we compared the scores from three AI detection tools (ZeroGPT, PhraslyAI, and Grammarly AI Detector) across five plausible conditions of AI use and evaluated them against human assessments. METHODS: Thirty open access articles published in the journals Advances in Simulation and Simulation in Healthcare prior to 2022 were selected, and the article introductions were extracted. Five experimental conditions were examined, including: (1) 100% human written; (2) human written, light AI editing; (3) human written, heavy AI editing; (4) AI written text from human content; and (5) 100% AI written from article title. The resulting materials were assessed by three open-access AI detection tools and five blinded human raters. Results were summarized descriptively and compared using repeated measures analysis of variance (ANOVA), intraclass correlation coefficients (ICC), and Bland-Altman plots. RESULTS: The three AI detection tools were able to differentiate between the five test conditions (p < 0.001 for all), but varied significantly in absolute score, with ICC ranging from 0.57 to 0.95, raising concerns regarding overall reliability of these tools. Human scoring was far less consistent, with an overall accuracy of 19%, indistinguishable from chance. CONCLUSION: While existing AI detection tools can meaningfully distinguish plausible AI use conditions, reliability across these tools is variable. Human scoring accuracy is uniformly low. Use of AI detection tools by scholars and journal editors may assist in determining potentially unethical use but they should not be relied upon alone at this time.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsEthics and Social Impacts of AI

Volltext beim Verlag öffnen

Ability of AI detection tools and humans to accurately identify different forms of AI-generated written content

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen