Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
MEDAI-LLM-SUMM: a reporting checklist for medical text summarization studies using large language models
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Background Medical text summarization using large language models (LLMs) has reached an inflection point in 2024–2025, with adapted models demonstrating capability to match or exceed human expert performance in specific tasks. However, critical gaps persist in safety validation, evaluation frameworks, and clinical deployment readiness. A comprehensive review revealed that only 7% of studies conducted external validation and 3% performed patient safety assessments, with hallucination rates ranging from 1.47% to 61.6%. Existing reporting guidelines, including CONSORT-AI, SPIRIT-AI, TRIPOD-LLM, and DEAL, do not adequately address the specific requirements of medical text summarization tasks. Objective to develop MEDAI-LLM-SUMM, the first specialized reporting checklist for research on medical text summarization using LLMs, addressing critical gaps in existing reporting standards. Methods A modified iterative consensus approach was employed, comprising three sequential stages: (1) a systematic literature review of 216 publications from PubMed and eLibrary (2023–2025) following PRISMA guidelines and an analysis of existing reporting standards (TRIPOD-LLM, DEAL, CONSORT-AI, SPIRIT-AI, TRIPOD + AI, CLAIM, STARD-AI); (2) development of an initial 44-item, 7-section checklist by a supervisory group; (3) three rounds of face-to-face consensus discussions with a multidisciplinary expert panel of 11 specialists (3 radiologists, 2 clinicians, 3 medical informatics experts, 1 biostatistician, and 2 medical LLM developers). The consensus criterion required unanimous agreement from all panel members. Results The final MEDAI-LLM-SUMM checklist comprises 24 items organized into six sections: (A) Clinical validity (4 items addressing clinical task definition, expert involvement, hypothesis formulation, and medical expertise requirements); (B) Model Selection (5 items covering model justification, system requirements, deployment environment, LLM-as-judge approach, and prompt documentation); (C) Data (3 items on datasets, reference summaries with expert consensus, and data stratification); (D) Quality Assessment (8 items including evaluation metrics, clinical metrics, expert evaluation, hallucination detection, LLM-judge assessment, sample size justification, pilot testing, and limitations documentation); (E) Safety (2 items on ethical approval and data anonymization); and (F) Data Availability (2 items on code and dataset accessibility). Comparative analysis with six existing reporting standards demonstrated that MEDAI-LLM-SUMM uniquely addresses hallucination assessment requirements, reference summary creation methodology, LLM-as-judge validation protocols, and detailed pilot testing specifications.