OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 31.03.2026, 11:58

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI misuse of retracted literature: A comparative study of ChatGPT4o, deepseek, and grok 3 in stem cell research

2025·0 Zitationen·Die NaturwissenschaftenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2025

Jahr

Abstract

DeepSeek and Grok 3 appear as strong competitors to AI models, particularly the widely accepted model, ChatGPT. The accuracy of the utilization of data in retracted scientific articles has proven to be a significant challenge for AI as an assistant in scientific research. It is critical to understand whether and how three AI models handle information from retracted articles when they answer scientific questions. We collected retracted articles and used AI models to generate questions and analyzed the answers. The answers were compared and evaluated among three AI models. Here we show that these three models utilized 84 out of 93 retracted articles in their answers about stem cells. ChatGPT4o retrieved 74 out of 93 (80%) articles and recognized the retract status for 46 (62%) of them. DeepSeek only found one retracted article and did not realize its retraction status. Grok 4 retrieved 69 (74%) articles and recognized the retraction status of 46 (67%) of them. In cases when the retracted articles were not identified, ChatGPT fabricated articles 5 times out of 19 (26%) for its answers. Grok 3 fabricated 15 articles out of 24 (63%) for its answers. In 82 times of 93 (88%) answers, DeepSeek fabricated the articles in various forms. The answering styles from ChatGPT4o, DeepSeek, and Grok 3 are characterized by accurate and straightforward, a tangential structure and guesswork, and comprehensive and detailed answers, respectively. Analysis with non-retracted articles revealed the similar patterns of these models. This study suggests that, while no model is perfect, DeepSeek performed the worst when facing in-depth scientific real-world challenges. Much improvement has to be made before any of these AI models become problem-free and valuable for scientists.

Ähnliche Arbeiten