Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT Is Still Not Good Enough at Giving Care-Seeking Advice, or Is It?

2025·1 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Artificial Intelligence tools like ChatGPT are increasingly used by patients to support their care-seeking decisions, although the accuracy of newer models remains unclear. We evaluated 16 ChatGPT models using 45 validated vignettes, each prompted ten times (7,200 total assessments). Each model classified the vignettes as requiring emergency care, non-emergency care, or self-care. We evaluated accuracy against each case’s gold standard solution, examined the variability across trials, and tested algorithms to aggregate multiple recommendations to improve accuracy. o1-mini achieved the highest accuracy (78%), but we could not observe an overall improvement with newer models – although reasoning models (e.g., o4-mini) improved their accuracy in identifying self-care cases. Selecting the lowest urgency level across multiple trials improved accuracy by 4 percentage points. Although newer models slightly outperform laypeople, their accuracy remains insufficient for standalone use. However, making use of output variability with aggregation algorithms can improve the performance of these models.

Autoren

Institutionen

Technische Universität Berlin(DE)

Themen

Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

ChatGPT Is Still Not Good Enough at Giving Care-Seeking Advice, or Is It?

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen