Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Avaliação do ChatGPT-4.0 Versus ChatGPT-Mini na Geração de Conteúdo sobre Hipertensão Baseado em Diretrizes
0
Zitationen
15
Autoren
2026
Jahr
Abstract
BACKGROUND: Artificial intelligence (AI) language models are increasingly used to generate patient education materials. However, their accuracy, completeness, and adherence to clinical guidelines remain uncertain. OBJECTIVES: To compare ChatGPT-Mini and ChatGPT-4.0 in the generation of hypertension education content with respect to accuracy, completeness, structural quality using the Ensuring Quality Information for Patients (EQIP), response consistency, and alignment with established guidelines. METHODS: A standardized set of 31 hypertension-related questions was submitted to both models. Outputs were independently evaluated by 10 blinded clinicians using a modified EQIP score, a 5-point accuracy scale, and a 3-point completeness scale. Response consistency was assessed using BERTScore. Between-model comparisons were performed using the two-sided Wilcoxon rank-sum test (p < 0.05). Effect sizes were reported as Hodges-Lehmann (HL) median differences and Cliff's delta (δ), both with 95% CIs. Inter-rater reliability was estimated using the intraclass correlation coefficient (ICC; two-way random effects model, absolute agreement). RESULTS: Central tendency measures favored ChatGPT-4.0, although differences were small. Median scores were as follows: accuracy, 4.10 (3.70-4.20) versus 3.73 (3.60-4.05); completeness, 1.26 (1.17-1.41) versus 1.10 (0.96-1.23); and total EQIP score, 19.5 (18.0-25.0) versus 18.5 (16.0-23.0) for ChatGPT-4.0 and ChatGPT-Mini, respectively. HL median differences were small, with 95% CIs crossing zero (accuracy: +0.37, -0.25 to +0.50; completeness: +0.16, -0.06 to +0.36; EQIP: +1.0, -1.0 to +6.0). Cliff's δ values were consistently small and positive across primary outcomes, indicating only modest stochastic dominance of ChatGPT-4.0. Identification clarity tended to be higher with ChatGPT-4.0, whereas response consistency measured by BERTScore F1 was generally higher for ChatGPT-Mini (> 0.92 versus 0.885-0.932). Inter-rater reliability was good to excellent across all measures (ICC > 0.80). CONCLUSIONS: ChatGPT-4.0 demonstrated small, non-significant improvements in accuracy, completeness, and structural quality compared with ChatGPT-Mini. Effect sizes were modest, and all 95% CIs included zero. ChatGPT-Mini produced more consistent responses. These findings underscore the importance of routinely reporting effect sizes with 95% CIs and support the use of standardized evaluation methods and real-time validation frameworks for AI-generated medical education content.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.549 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.443 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.941 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
- Romullo José Costa Ataídes
- Marcos Adriano Garcia Campos
- João Vítor Perez de Souza
- Rafael Cardoso Rocha
- Almir Alamino Lacalle
- Ciro Bezerra Vieira
- Thiago Artioli
- Tiago Cordeiro Medeiros
- Erito Marques de Souza Filho
- Ronaldo Altenburg Gismondi
- Érika Maria Gonçalves Campana
- Francisco José Romeo
- Victor Razuk
- João Ricardo Nickenig Vissoci
- Renato Delascio Lopes
Institutionen
- Universidade de São Paulo(BR)
- Universidade Brasil(BR)
- Duke University(US)
- Instituto Dante Pazzanese de Cardiologia(BR)
- Universidade Federal do Maranhão(BR)
- Universidade Federal Fluminense(BR)
- Universidade Federal Rural do Rio de Janeiro(BR)
- Universidade do Estado do Rio de Janeiro(BR)
- University of Miami(US)
- Clinical Research Institute(US)