Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Content Validity of AI-Generated Medical Information on Idiopathic Pulmonary Fibrosis (IPF): A Comparative Analysis of ChatGPT-4 and Gemini 1.5 Pro
0
Zitationen
10
Autoren
2025
Jahr
Abstract
<bold>Background:</bold> IPF is characterized by progressive declining respiratory function and quality of life, with high mortality. Large language models (LLMs) produce coherent medical information, but their accuracy, readability, and adherence to IPF guidelines remain unconfirmed. <bold>Aim:</bold> To evaluate the reliability and accuracy of LLMs in generating medically and clinically relevant content related to IPF. <bold>Methods:</bold> A comparative analysis of ChatGPT-4 and Gemini 1.5 Pro responses about IPF-related 23 questions from ATS/ERS/JRS/ALAT guidelines was conducted. Six independent ILD experts assessed responses for accuracy (DISCERN), reliability (JAMA Benchmark Criteria), readability (Flesch-Kincaid), and guidelines adherence. Mann-Whitney U tests and intraclass correlation coefficients (ICC) were used to compare model performance. <bold>Results:</bold> Both LLMs provided partially sufficient responses, with a median JAMA Benchmark score of 2 for both models (p = 0.24). Gemini 1.5 Pro generated higher-quality treatment-related responses compared to ChatGPT-4, as reflected by significantly higher DISCERN scores of 56 and 43, respectively (p < 0.001). Regarding readability, both models required college-level comprehension. The ICC analysis revealed significant inter-rater variability, with ChatGPT-4 demonstrating lower agreement (ICC = 0.361) than Gemini 1.5 Pro (ICC = 0.813). <bold>Conclusion:</bold> While both models offer coherent medical information, their reliability remains suboptimal. Further research should focus on improving AI readability on IPF for practical integration into clinical practice.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.100 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.466 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.