OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 20:49

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract 4368245: Feasibility and Utility of Large Language Models Based on DeepSeek-R1 and ChatGPT-4o for the Interpretation of Cardiac Magnetic Resonance Reports: A Real-World Pilot Study

2025·0 Zitationen·Circulation
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2025

Jahr

Abstract

Background: Large language models (LLMs) serve as a promising tool for interpreting cardiac magnetic resonance (CMR) reports into more accessible language for patients, due to their advanced natural language understanding capabilities. This study aimed to assess the feasibility and utility value of two LLMs (DeepSeek-R1 and ChatGPT 4) in interpreting CMR reports and to compare the differences across specific dimensions. Methods: In this prospective pilot study, 110 patients undergoing CMR at Fuwai Hospital (Beijing, China) between March and April 2025 were consecutively enrolled. Each structured CMR original-report was randomly assigned to one of the two pre-trained LLMs to generate LLM-report. Structured Likert-scale questionnaires were developed to assess the comprehensibility and quality of the LLM-reports. Patients evaluated LLM-report understanding across four dimensions, while CMR radiologists assessed LLM-report quality across five dimensions. The reliability and validity of the questionnaire were tested using Cronbach's α, the Kaiser Meyer Olkin (KMO) measure Bartlett’s test of sphericity, and exploratory factor analysis. Bonferroni correction was used to adjust for potential statistical bias. Results: Ultimately, 100 LLM-reports were analyzed (mean age 48.82 ± 12.569 years; 72% male; hypertrophic cardiomyopathy 34%, dilated cardiomyopathy 33%, coronary artery disease 33%), with 50 interpreted by DeepSeek-R1 and 50 by ChatGPT-4o. The questionnaire demonstrated excellent internal reliability and construct validity, with a Cronbach’s α of 0.849, a KMO value of 0.803, and a cumulative variance explanation rate of 68.3%. Compared to original-reports, LLM-reports significantly improved scores across all four dimensions (all p < 0.013). However, no significant differences were seen between reports interpreted by DeepSeek-R1 and ChatGPT-4o across all four dimensions (all p >0.013). In quality assessment, no significant differences were seen between the two models across all four dimensions (all p >0.01). Conclusion: This pilot study represented the first direct comparison of two large language models (LLMs) in interpreting structured CMR reports, integrating both patient feedback and radiologist evaluation. The findings suggested that LLMs showed good feasibility and utility in interpreting CMR reports into patient-accessible language, with comparable performance and quality between DeepSeek-R1 and ChatGPT-4o.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCardiovascular Health and Risk FactorsMachine Learning in Healthcare
Volltext beim Verlag öffnen