Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Harnessing <scp>AI</scp> for Comprehensive Reporting of Medical <scp>AI</scp> Research
0
Zitationen
1
Autoren
2025
Jahr
Abstract
In this editorial, I would like to succinctly discuss the potential of using AI to improve reporting medical AI research. There are already several published guidelines and checklists in the current literature but how they are interpreted and implemented varies with publishers, editors, reviewers and authors. Here, I discuss the possibility of harnessing generative AI tools in order to assist authors to comprehensively report their AI work and meet current guidelines, with the ultimate aim to improve transparency and replicability in medical AI research. The succinct discussion below reckons two key issues: (1) AI has a seductive allure that might affect how AI-generated evidence is scrutinized and disseminated, hence the need for comprehensive and transparent reporting, and (2) authors sometimes feel uncertain about what to report in the light of so many existing guidelines about reporting AI research and the lack of consensus in the field. It has been argued that extraneous or irrelevant information with a seductive allure can improve the ratings of scientific explanations [1]. AI, with its overhyped knowledgeability, can convey biases and false information that readers might judge believable [2]. AI can write highly convincing text that can impress or deceive readers, even in the presence of errors and false information [3, 4]. Likewise, merely mentioning “AI” in the title of a research paper seems to increase its citation potential [5]. The latter might incentivise scientists to use AI purely to boost their work citability, regardless of whether AI improved their work quality. In this context, one might speculate that some publications that used AI but with flawed methodologies or wrong conclusions might have slipped through the cracks of peer review, with many already being indexed and citable [6]. Overall, emerging evidence suggests that AI has an intrinsic seductive allure that is shaping the medical research landscape and impacting how readers appraise research articles that employ AI. This is why improving the reporting and evaluation of AI work is of paramount importance, and in this editorial, I underscore the potential role of generative AI for that purpose. Consider this: readers might find a paper entitled “Association between condition X and biomarker Y demonstrated with deep learning” novel and worth reading. Now, imagine if the same finding was evidenced with a traditional analysis method and entitled “Association between condition X and biomarker Y demonstrated with a correlation analysis”, though it is unlikely that the authors of the latter will consider correlation analysis worth mentioning in the article title. Although both pieces of work report the same finding, they may not enjoy the same buzz and high citability in the field. This is because AI-based methods and traditional analysis methods operate at different maturity levels. Readers (and reviewers) are quite familiar with the scope and limitations of a correlation analysis, but the same cannot be said about AI. Having clear guidelines on how to comprehensively report and rigorously evaluate medical AI research is thus extremely important. No one denies AI's huge potential in medical research, such as automating the analysis of complex medical data and accelerating the discovery of useful markers. However, AI may discover new data-driven features and disease-markers relationships that do not always align with prior medical knowledge, raising the question of how to reconcile common medical knowledge with AI-generated evidence. Likewise, there is a risk that AI's seductive allure might diminish the critical analysis and scrutiny of AI-generated evidence, thus weakening the rigour of the peer review process in evaluating AI papers. Therefore, when AI is used to enhance the process of scientific discovery, the core principles of scientific methodology, including falsifiability, must be upheld. However, when it comes to falsifiability, independently testing and disproving AI-generated evidence remains difficult. For example, does a 2% reduction in accuracy or another performance metric disprove a particular AI method? Indeed, there is no consensus about the conceptual and methodological frameworks by which AI-generated evidence can be securitized and falsified. This is because deploying AI to study a particular question involves several aspects that create multiple sources of error or bias that are not always easy to gauge. This includes how data is curated, cleaned, imputed, augmented, divided or aggregated, how relevant features are identified, reduced or combined, and how AI architecture is built, trained or validated. As AI can generate fabricated articles including articles with empirical results (see discussion in [4]), frameworks that uphold falsification are paramount in AI research [7]. The recent example of the AI-Generated Science (AIGS) system [8], with AI agents that can independently and autonomously create knowledge, poses significant questions to AI research at many ethical, legal and scientific levels. This is why the authors of AIGS identified falsification as a core agent of that system to verify and scrutinise AI-generated scientific discoveries. To minimise the risk of proliferation of flawed or fabricated AI research that could harm clinical practice, many guidelines and checklists for improving the reporting of medical AI research have been proposed. Such AI reporting guidelines are very useful to support authors to comprehensively present their AI work and to enhance the rigorous evaluation of their work during the peer review process. Some of the existing guidelines include MAIC-10 (Must AI Criteria-10), CLAIM (Checklist for Artificial Intelligence in Medical Imaging), STARD-AI (Standards for Reporting of Diagnostic Accuracy Study-AI), MI-CLAIM (Minimum Information about Clinical Artificial Intelligence Modeling), MINIMAR (Minimum Information for Medical AI Reporting), RQS (Radiomics Quality Score), QAMAI (Quality Analysis of Medical Artificial Intelligence), TRIPOD+AI (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis), CONSORT-AI (Consolidated Standards of Reporting Trials–AI), SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-AI), FUTURE-AI (Fairness Universality Traceability Usability Robustness Explainability-AI), CAIR (Clinical AI Research), DECIDE-AI (Developmental and Exploratory Clinical Investigations of DEcision support systems driven by Artificial Intelligence), CLEAR (CheckList for EvaluAtion of Radiomics research), DOME (Data, Optimization, Model and Evaluation); see discussion in [9-11]. The relevance of each checklist depends on the specific topic and scope of the AI research. However, AI researchers feel overwhelmed (and sometimes confused) by so many guidelines and checklists that are not implemented or interpreted in the same way by reviewers, editors and publishers. Hence, to maximise their impact and usefulness, publishers should consider offering easy-to-follow article templates that explicitly specify what one must report in each section of a manuscript in order to meet their guidelines and checklists about AI research. Likewise, similar to existing AI-powered tools for plagiarism detection, image manipulation, and language editing, publishers should join force with AI developers to create AI-powered tools that can automatically flag up submissions that do not conform to specific guidelines and provide constructive feedback to authors on how to improve the reporting of their AI research. Such AI tools can be made accessible to authors before submission to guide them through the process of improving their manuscripts. These tools should be fine-tuned and updated regularly to meet the ever-changing challenges and trends of AI research, thereby ensuring comprehensive and accurate reporting of medical AI research and ultimately improving transparency and replicability in the field. The author declares no conflicts of interest. Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.493 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.377 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.555 Zit.