OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 07.04.2026, 18:12

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the performance of a generative AI model in assessing qualitative health research articles adherence to objective reporting standards

2026·0 Zitationen·Scientific ReportsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

As qualitative research increasingly informs patient-centred care, rapid assessment of existing evidence to meet research guidelines is needed to inform practice settings. We evaluate the performance of Claude, a generative AI model, in assessing qualitative articles adherence to a consensus-based reporting guideline. The Consolidated Criteria for Reporting Qualitative Research (COREQ), commonly used in qualitative research, is used as a reference criteria list to test the performance of Claude. 15 articles from a systematic scoping review were extracted for analysis. Structured prompts were applied to Claude to evaluate if each criterion in COREQ is met for each article. Two independent reviewers checked model results for concordance and accuracy. The F1, balanced accuracy (BA) scores, Matthews correlation coefficient (MCC) and other performance metrics were tabulated at the criterion, criterion domain, and article level. 4 main categories were identified from performance results, namely: (1) balanced (6/32 criteria, 18.75%), (2) under-reported (2/32, 6.25%), (3) mixed errors (9/32, 28.13%), and (4) information limited (15/32, 46.88%) clusters. Results show heterogeneity amongst different clusters of criteria. While balanced criteria perform consistently across a range of metrics, criteria in under- or over-reported clusters require targeted prompt adjustments. Limited information criteria require a larger sample of articles to verify results. Clearly defined criteria outperformed criteria that were broadly defined or requires interpretation. Segmenting criteria into performance clusters allow researchers to identify areas of incongruence, so that specific strategies to modify prompts may be utilised for any given set of research articles. Customised approaches that are expertly crafted can allow for the rapid extraction of valuable insights that may inform patient-centred recommendations and practice guidelines.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Meta-analysis and systematic reviewsArtificial Intelligence in Healthcare and EducationReliability and Agreement in Measurement
Volltext beim Verlag öffnen