Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Data avaliability for "Critical appraisal tools for Artificial Intelligence clinical Studies. A scoping review" (Preprint)
0
Zitationen
12
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> Health research that uses predictive and/or generative AI is rapidly growing. Just as in traditional clinical studies, the way in which AI studies are conducted can introduce systematic errors. Transmission of this AI evidence into clinical practice and research needs critical appraisal tools for clinical decision makers and researchers. We carried out a scoping review (DOI: 10.2196/preprints.77110) to identify existing tools for critical appraisal of clinical studies that use artificial intelligence (AI) and examine the concepts and domains these tools explore. The question is in PCC framework, P: (population) Artificial intelligence clinical studies. C (Concept): tools for critical appraisal and associated constructs such as quality, reporting, validity, risk of bias, and applicability. C (context): in clinical practice context. In addition, bias classification and Chatbot assessment studies were included. A total of 70 records were finally included in the review. 46 of them were reporting guidelines, 15 tools for critical appraisal, 2 for quality of study and 2 for risk of bias. Nine papers ware focused on bias classification or mitigation. We found 15 Chatbots assessment studies or systematic reviews of Chatbots studies (6 and 9 respectively) which are a very heterogeneous group. </sec> <sec> <title>OBJECTIVE</title> To publish the primary data used in our scoping review for public availability. </sec> <sec> <title>METHODS</title> We searched in medical and engineering databases (MEDLINE, EMBASE, CINAHL, PsycINFO and IEEE) from inception to April 2024. We included clinical primary research with tools for critical appraisal. Classic reviews and systematic reviews were included in first phase of screening. They were excluded in the secondary phase, after identifying new tools by forward snowballing. We excluded non-human, computer and mathematical research, and letters, opinion papers and editorials. We used Rayyan for screening. Data extraction was done by two observers and discrepancies were solved by discussion. The protocol was previously registered in OSF (https://doi.org/10.17605/OSF.IO/ETYDS). We adhered to the PRISMA extension for Scoping reviews and to the PRISMA-Search extension for Reporting Literature in Systematic Reviews. Data extraction for chatbot studies used a hybrid approach, combining the active involvement of a researcher with a fully supervised ChatGPT-RAG (Retrieval-Augmented Generation) model. This strategy was adopted given the predictable heterogeneity of chatbot interventions, with the aim of enhancing the clarity and reproducibility of extracted data. ChatGPT-4o was employed to assist in drafting and refining the extraction tables, but all outputs were independently reviewed by two authors against the original articles. Discrepancies were reviewed and resolved through consensus. No sensitive data was exposed. To promote transparency and reproducibility the exact prompts used in the RAG process are incorporated into this data set. As an example to explain the process: the systematic review RAG (in Spanish) was used for the data extraction of Oh 2021. A PDF was uploaded to ChatGPT-4o and the output is shown in the PDF file (example of ChatGPT output in a paper). The result was incorporated as a column in the XLS Rev_sis extraction, (it may be seen in the H column). This procedure was repeated for all the articles both in systematic reviews and primary research (Rev_sis extraction and Primary studies extraction). Both XLS were tansposed from columns to rows to build the final xls (S reviews table draft and Primary studies table draft). The result was translated from Spanish to English by the reviewers. Both table drafts were supervised and compared to the actual papers by LLR afterwards verified by JCB. The final tables were synthetised as shown in the scoping review, where they appear as table 4 and 5. </sec> <sec> <title>RESULTS</title> We identified 4392 records in databases and registries. After eliminating 470 duplicates, 3922 records were screened by title and abstract and 3803 were excluded. The remaining 119 underwent full-text screening and 59 were excluded. The reasons for exclusion were: 50 were systematic reviews; 7 studies met the exclusion criteria; and 2 did not meet the inclusion criteria. Full details are available in supplementary material (Exclusions after full-text screening.) Of the 50 systematic reviews, 42 used specific AI tools to assess quality of the studies and the tools retrieved were incorporated into “Records identified via other methods”). Twelve studies were identified in the Equator-network library, and four additional studies were obtained from experts and organizations, therefore, there were 58 records identified via other methods. Forty-eight of these were already captured in the 60 included studies from the search of electronic databases, leaving 10 additional studies to be included. Thus, a total of 70 studies were included in this review. </sec> <sec> <title>CONCLUSIONS</title> This data set may be useful for other researchers who may want to corroborate or further develop our results on critical appraisal tools for artificial intelligence clinical studies. Conflict of Interest: All authors declare no conflict of interest. Funding: No funding was received for this research Multimedia Appendix: IRB approval was not applicable in this study and dataset. The scoping review built on this data set has been accepted for publication in the Journal of Medical Internet Research. We enclose the following files: GPT-RAG (prompting) TEMPLATE ENGLISH GPT-RAG (prompting) TEMPLATE PDF Example XLS CA Tools & Bias 10 10 2025 Chatbot 10 10 2025 </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.