OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 06.04.2026, 03:41

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT and DeepSeek: Strengths, Limitations, and the Future of Generative AI

2025·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

Artificial Intelligence (AI) reshaping the world through automation and intelligent decision making. One of the popular fields in AI is autoregressive Large Language Models (LLMs), such as ChatGPT and DeepSeek. Currently, autoregressive LLMs offer substantial benefits across diverse fields, including Mathematics, Science, Medical, Programming, and Literature. They can solve mathematical problems, correct grammatical errors, paraphrase text, translate languages, prove scientific laws, provide medical report, write code, and debug programs. Autoregressive LLMs operate by predicting subsequent tokens based on preceding text, enabling them to generate coherent, contextually relevant responses. ChatGPT, developed by OpenAI, leverages a robust transformer architecture combined with chainof-thought reasoning to produce nuanced and high-quality text. In contrast, DeepSeek employs advanced techniques like model distillation and a mixture-of-experts (MoE) approach which allow it to deliver competitive performance with significantly lower computational costs. These capabilities empower both models to effectively solve diverse complex problems. In this study, we conducted three key tasks: (1) a comparative background analysis of ChatGPT and DeepSeek Generative AI (GAI) models; (2) extensive experiments across a range of generative tasks; and (3) a comprehensive user survey. Specifically, we performed comparative experiments involving ChatGPT, DeepSeek, and human-generated content across multiple domains, including Education, Science, Medicine, and Programming. For one experiment, we generated text on the topic of Computer using both ChatGPT and DeepSeek, and systematically compared these outputs against human-authored writing. ChatGPT achieved ROUGE-L, BLEU, and BERT scores of 16.87%, 1.85%, and 11.91%, respectively, while DeepSeek attained scores of 15.34%, 0.96%, and 11.91%. In the programming domain, we assessed code generation performance using problem descriptions sourced from the widely used online judge platform Codeforces. ChatGPT achieved ROUGE-L, BLEU, and BERT scores of 36%, 34%, and 77%, respectively. DeepSeek outperformed ChatGPT in this task, achieving scores of 55%, 51%, and 84%. Both models also demonstrated strong performance in medical diagnosis report generation, each producing reports with approximately 90% accuracy, as verified by a medical expert. Additionally, we evaluated their mathematical reasoning by tasking both models with solving a classical Fourier series problem. A mathematics expert confirmed the correctness of the solutions provided by both models. To gain user perspectives, we conducted a comprehensive survey involving students, educators, and researchers to understand how ChatGPT and DeepSeek support learning and problem-solving. The survey included open-ended questions that invited participants to reflect on the strengths, limitations, and future potential of GAI. We present and briefly analyze the findings derived from these responses.

Ähnliche Arbeiten