Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Harnessing multimodal large language models to interpret ecological momentary assessment-generated caregiving photographs

2026·0 Zitationen·Discover Artificial IntelligenceOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Multimodal large language models (MLLMs) are rapidly advancing tools capable of synthesizing text and image data, and their potential application with patient- and caregiver-generated health data is gaining increasing attention. Ecological momentary assessment (EMA), which captures real-time experiences using text, ratings, or photographs, offers valuable insights but produces large, diverse datasets that are challenging to integrate into routine clinical care. Photographs, in particular, provide unique contextual information and are currently underutilized due to the substantial time required for review. MLLMs can address this challenge by efficiently summarizing and interpreting mixed media EMA records. This pilot study evaluated GPT-4’s ability to generate accurate insights from EMA datasets containing photographs with and without accompanying text. Twelve health sciences students simulated the role of glioblastoma family caregivers and submitted photographs and text describing caregiving challenges and successes over seven days. GPT-4 produced photograph descriptions and interpretations and group themes, which were evaluated by expert raters and compared with participant narratives using sentiment and semantic similarity analyses. Results showed that GPT-4 was generally accurate in describing visual content but less reliable in capturing participant-intended meaning. Expert ratings indicated strong accuracy but moderate agreement with ground truth, and semantic similarity analysis revealed loose alignment between participant narratives and GPT-4 interpretations. GPT-4 also exhibited a tendency toward more positive sentiment ratings than participants. While most outputs were accurate, occasional misinterpretations highlighted potential safety concerns. Findings suggest that MLLMs show promise in analyzing caregiver-generated photographs but need continued refinement to maximize their utility within healthcare.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCancer survivorship and careHealth, Environment, Cognitive Aging

Volltext beim Verlag öffnen

Harnessing multimodal large language models to interpret ecological momentary assessment-generated caregiving photographs

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen