OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.04.2026, 05:29

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Hype versus reality of artificial intelligence (AI) platforms: unmasking the limitations of large language models in the use of scientific writing and reporting

2026·0 Zitationen·Extracellular VesicleOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various natural language processing tasks. However, their application to complex, domain-specific summarization, such as scientific conference presentations, remains constrained by limitations in long-context understanding, factual accuracy, and content attribution. In this study, we systematically evaluated five state-of-the-art LLMs, including ChatGPT, DeepSeek, Gemini, Grok, and Qwen, each tested in both standard and reasoning-augmented configurations. All models were tasked with summarizing a full-length audio transcript comprising approximately 160,000 words of 64 speakers from the 2024 annual meeting of the American Association of Extracellular Vesicles (AAEV 2024). While the models were capable of extracting high-level themes and generating readable summaries, we observed persistent deficiencies in speaker coverage, affiliation attribution, and reference citation. Gemini 2.5 Pro achieved the best overall performance, yet even the top-performing models failed to summarize up to one-third of the speakers and did not produce accurate or complete reference citations. Incorporating reasoning processes led to measurable improvements in summarization quality across most LLMs. These findings underscore that current LLMs are not yet capable of fully autonomous scientific summarization. Our results highlight the need for more advanced reasoning mechanisms and the development of multi-agent architectures composed of specialized modules for speaker classification, citation verification, and content synthesis. Until such systems mature, expert oversight remains essential to meet the rigorous standards of biomedical communication.

Ähnliche Arbeiten