OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.03.2026, 00:19

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accuracy and hallucination of DeepSeek and ChatGPT in scientific figure interpretation and reference retrieval

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

16

Autoren

2025

Jahr

Abstract

<title>Abstract</title> Artificial intelligence-based large language models (AI-LLMs) are increasingly used in biomedical research, but concerns remain regarding their accuracy and reliability, particularly in interpreting scientific data and generating references. This study assessed the performance of three AI-LLMs—DeepSeek-R1, ChatGPT-4o, and Deep Research—in interpreting scientific figures and retrieving bibliographic references. Fifteen figures were analyzed using five parameters: relevance, clarity, depth, focus, and coherence. Reference accuracy was evaluated across seven topics, and hallucination scores were calculated based on errors in titles, DOIs, journals, authors, or publication dates. ChatGPT-4o significantly outperformed DeepSeek-R1 in image interpretation (p &lt; 0.001). In reference retrieval, DeepSeek-R1 had the highest hallucination rate (91.43%), while ChatGPT-4o and Deep Research had lower rates (39.14% and 26.57%, respectively), with Deep Research producing the most accurate references. Although ChatGPT-4o and Deep Research showed better overall performance, the presence of hallucinations in all models highlights the need to carefully verify AI-generated content in academic contexts and improve AI reference generation tools.

Ähnliche Arbeiten