Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accuracy and hallucination of DeepSeek and ChatGPT in scientific figure interpretation and reference retrieval

2025·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

<title>Abstract</title> Artificial intelligence-based large language models (AI-LLMs) are increasingly used in biomedical research, but concerns remain regarding their accuracy and reliability, particularly in interpreting scientific data and generating references. This study assessed the performance of three AI-LLMs—DeepSeek-R1, ChatGPT-4o, and Deep Research—in interpreting scientific figures and retrieving bibliographic references. Fifteen figures were analyzed using five parameters: relevance, clarity, depth, focus, and coherence. Reference accuracy was evaluated across seven topics, and hallucination scores were calculated based on errors in titles, DOIs, journals, authors, or publication dates. ChatGPT-4o significantly outperformed DeepSeek-R1 in image interpretation (p < 0.001). In reference retrieval, DeepSeek-R1 had the highest hallucination rate (91.43%), while ChatGPT-4o and Deep Research had lower rates (39.14% and 26.57%, respectively), with Deep Research producing the most accurate references. Although ChatGPT-4o and Deep Research showed better overall performance, the presence of hallucinations in all models highlights the need to carefully verify AI-generated content in academic contexts and improve AI reference generation tools.

Autoren

Institutionen

Airlangga University(ID)

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingTopic Modeling

Volltext beim Verlag öffnen

Accuracy and hallucination of DeepSeek and ChatGPT in scientific figure interpretation and reference retrieval

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen