Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility

2025·4 Zitationen·PLOS Digital HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This pilot study is the first phase of a broader project aimed at developing an explainable artificial intelligence (AI) tool to support the ethical evaluation of Japanese-language clinical research documents. The tool is explicitly not intended to assist document drafting. We assessed the baseline performance of generative AI-Generative Pre-trained Transformer (GPT)-4 and GPT-4o-in analyzing clinical research protocols and informed consent forms (ICFs). The goal was to determine whether these models could accurately and consistently extract ethically relevant information, including the research objectives and background, research design, and participant-related risks and benefits. First, we compared the performance of GPT-4 and GPT-4o using custom agents developed via OpenAI's Custom GPT functionality (hereafter "GPTs"). Then, using GPT-4o alone, we compared outputs generated by GPTs optimized with customized Japanese prompts to those generated by standard prompts. GPT-4o achieved 80% agreement in extracting research objectives and background and 100% in extracting research design, while both models demonstrated high reproducibility across ten trials. GPTs with customized prompts produced more accurate and consistent outputs than standard prompts. This study suggests the potential utility of generative AI in pre-institutional review board (IRB) review tasks; it also provides foundational data for future validation and standardization efforts involving retrieval-augmented generation and fine-tuning. Importantly, this tool is intended not to automate ethical review but rather to support IRB decision-making. Limitations include the absence of gold standard reference data, reliance on a single evaluator, lack of convergence and inter-rater reliability analysis, and the inability of AI to substitute for in-person elements such as site visits.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationEthics in Clinical ResearchAutopsy Techniques and Outcomes

Volltext beim Verlag öffnen

Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen