Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Ability of AI detection tools and humans to accurately identify different forms of AI-generated written content
0
Zitationen
9
Autoren
2025
Jahr
Abstract
BACKGROUND: The increasing use of artificial intelligence (AI) by scholars presents a pressing challenge to healthcare publishing. While legitimate use can potentially accelerate scholarship, unethical approaches also exist, leading to factually inaccurate and biased text that may degrade scholarship. Numerous online AI detection tools exist that provide a percentage score of AI use. These can assist authors and editors in navigating this landscape. In this study, we compared the scores from three AI detection tools (ZeroGPT, PhraslyAI, and Grammarly AI Detector) across five plausible conditions of AI use and evaluated them against human assessments. METHODS: Thirty open access articles published in the journals Advances in Simulation and Simulation in Healthcare prior to 2022 were selected, and the article introductions were extracted. Five experimental conditions were examined, including: (1) 100% human written; (2) human written, light AI editing; (3) human written, heavy AI editing; (4) AI written text from human content; and (5) 100% AI written from article title. The resulting materials were assessed by three open-access AI detection tools and five blinded human raters. Results were summarized descriptively and compared using repeated measures analysis of variance (ANOVA), intraclass correlation coefficients (ICC), and Bland-Altman plots. RESULTS: The three AI detection tools were able to differentiate between the five test conditions (p < 0.001 for all), but varied significantly in absolute score, with ICC ranging from 0.57 to 0.95, raising concerns regarding overall reliability of these tools. Human scoring was far less consistent, with an overall accuracy of 19%, indistinguishable from chance. CONCLUSION: While existing AI detection tools can meaningfully distinguish plausible AI use conditions, reliability across these tools is variable. Human scoring accuracy is uniformly low. Use of AI detection tools by scholars and journal editors may assist in determining potentially unethical use but they should not be relied upon alone at this time.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.545 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.436 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.935 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.589 Zit.