Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Development and content validation of the CAREFUL-AI framework for evaluating AI-generated scientific manuscripts: an exploratory cross-platform study
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Abstract The expanding use of Large Language Models (LLMs) in scientific writing raises critical concerns regarding research accountability, integrity, and transparency. This study aimed to develop and content-validate the CAREFUL-AI framework and apply it to evaluate the scientific quality of AI-generated research manuscripts. A three-round Delphi process with 10 multidisciplinary experts was conducted to develop the CAREFUL-AI framework. Content validity was assessed using item-level and scale-level content validity indices. Using standardized, locked prompts, 150 complete scientific research manuscripts across five study designs were generated by six freely accessible LLM platforms ChatGPT (GPT-5.2), Claude (Claude 4.5), Gemini (Gemini 3 Flash), Grok (Grok 4.1), DeepSeek (DeepSeek-V3.2), and Meta AI (Llama 4 Maverick). Manuscripts were evaluated by the respective prompt administrators using the CAREFUL-AI framework. The framework demonstrated strong content validity (I-CVI: 0.83–1.00; S-CVI/Ave: 0.92). Overall, 38.0% of manuscripts were rated high quality, 52.7% moderate quality, and 9.3% low quality. Claude generated the highest proportion of high-quality manuscripts, while Meta AI produced all low-quality outputs. Reproducibility and uncertainty handling consistently received the lowest domain scores across models. The CAREFUL-AI framework provides researchers with a structured tool to critically appraise AI-generated manuscripts, helping safeguard methodological rigor, transparency, and evidence reliability in scientific writing. Substantial variability exists in the quality of AI-generated manuscripts. Although Claude demonstrated comparatively stronger performance, persistent deficiencies in key integrity domains indicate that AI-assisted manuscript generation requires robust human oversight. CAREFUL-AI provides a content-validated framework to support ethical governance and editorial oversight of AI-assisted research.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.628 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.176 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.880 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.