Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Hype versus reality of artificial intelligence (AI) platforms: unmasking the limitations of large language models in the use of scientific writing and reporting
0
Zitationen
3
Autoren
2026
Jahr
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities across various natural language processing tasks. However, their application to complex, domain-specific summarization, such as scientific conference presentations, remains constrained by limitations in long-context understanding, factual accuracy, and content attribution. In this study, we systematically evaluated five state-of-the-art LLMs, including ChatGPT, DeepSeek, Gemini, Grok, and Qwen, each tested in both standard and reasoning-augmented configurations. All models were tasked with summarizing a full-length audio transcript comprising approximately 160,000 words of 64 speakers from the 2024 annual meeting of the American Association of Extracellular Vesicles (AAEV 2024). While the models were capable of extracting high-level themes and generating readable summaries, we observed persistent deficiencies in speaker coverage, affiliation attribution, and reference citation. Gemini 2.5 Pro achieved the best overall performance, yet even the top-performing models failed to summarize up to one-third of the speakers and did not produce accurate or complete reference citations. Incorporating reasoning processes led to measurable improvements in summarization quality across most LLMs. These findings underscore that current LLMs are not yet capable of fully autonomous scientific summarization. Our results highlight the need for more advanced reasoning mechanisms and the development of multi-agent architectures composed of specialized modules for speaker classification, citation verification, and content synthesis. Until such systems mature, expert oversight remains essential to meet the rigorous standards of biomedical communication.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.436 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.311 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.753 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.523 Zit.