Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ChatGPT and DeepSeek: Strengths, Limitations, and the Future of Generative AI
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Artificial Intelligence (AI) reshaping the world through automation and intelligent decision making. One of the popular fields in AI is autoregressive Large Language Models (LLMs), such as ChatGPT and DeepSeek. Currently, autoregressive LLMs offer substantial benefits across diverse fields, including Mathematics, Science, Medical, Programming, and Literature. They can solve mathematical problems, correct grammatical errors, paraphrase text, translate languages, prove scientific laws, provide medical report, write code, and debug programs. Autoregressive LLMs operate by predicting subsequent tokens based on preceding text, enabling them to generate coherent, contextually relevant responses. ChatGPT, developed by OpenAI, leverages a robust transformer architecture combined with chainof-thought reasoning to produce nuanced and high-quality text. In contrast, DeepSeek employs advanced techniques like model distillation and a mixture-of-experts (MoE) approach which allow it to deliver competitive performance with significantly lower computational costs. These capabilities empower both models to effectively solve diverse complex problems. In this study, we conducted three key tasks: (1) a comparative background analysis of ChatGPT and DeepSeek Generative AI (GAI) models; (2) extensive experiments across a range of generative tasks; and (3) a comprehensive user survey. Specifically, we performed comparative experiments involving ChatGPT, DeepSeek, and human-generated content across multiple domains, including Education, Science, Medicine, and Programming. For one experiment, we generated text on the topic of Computer using both ChatGPT and DeepSeek, and systematically compared these outputs against human-authored writing. ChatGPT achieved ROUGE-L, BLEU, and BERT scores of 16.87%, 1.85%, and 11.91%, respectively, while DeepSeek attained scores of 15.34%, 0.96%, and 11.91%. In the programming domain, we assessed code generation performance using problem descriptions sourced from the widely used online judge platform Codeforces. ChatGPT achieved ROUGE-L, BLEU, and BERT scores of 36%, 34%, and 77%, respectively. DeepSeek outperformed ChatGPT in this task, achieving scores of 55%, 51%, and 84%. Both models also demonstrated strong performance in medical diagnosis report generation, each producing reports with approximately 90% accuracy, as verified by a medical expert. Additionally, we evaluated their mathematical reasoning by tasking both models with solving a classical Fourier series problem. A mathematics expert confirmed the correctness of the solutions provided by both models. To gain user perspectives, we conducted a comprehensive survey involving students, educators, and researchers to understand how ChatGPT and DeepSeek support learning and problem-solving. The survey included open-ended questions that invited participants to reflect on the strengths, limitations, and future potential of GAI. We present and briefly analyze the findings derived from these responses.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.393 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.259 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.688 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.502 Zit.