Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Addressing the Novel Implications of Generative AI for Academic Publishing, Education, and Research
11
Zitationen
1
Autoren
2024
Jahr
Abstract
Editor’s Note: The opinions expressed in this editorial do not necessarily reflect the opinions of the AAMC or its members. Generative artificial intelligence (GenAI), such as ChatGPT, has dramatically changed academic publishing, education, and research in a very short time. Novel issues have arisen for the developers of GenAI, as well as for authors, researchers, editors, reviewers, readers, and the public.1,2 The idea that GenAI, like all tools, could be misused was anticipated, as have been many of the legal, financial, and technical challenges associated with artificial intelligence (AI).3–6 But only with broad use of GenAI tools have new, really complex, and truly unexpected phenomena arisen. For instance, the observation that GenAI would invent or fabricate data and references that it deemed “should” exist—i.e., that GenAI would hallucinate (Dictionary.com’s 2023 Word of the Year7 to acknowledge GenAI’s profound ramifications)—caught us all by surprise.8,9 The literature on the use of AI tools in academic scholarship has rapidly expanded, and the findings are humbling. As one example, Májovský et al10 performed a proof-of-concept study using ChatGPT to create a “highly convincing” but “completely fabricated article” in the field of neurosurgery. The authors were successful in generating, in 1 hour, a full manuscript that appeared “sophisticated and seemingly flawless” that included an “abstract, introduction, material and methods, discussion, references, charts, etc.,” with 1,992 words and 17 citations. Only with careful review by experts from different disciplines were errors identifiable in the fabricated article. In another recently published report,11 linguists Casal and Kessler performed a structured interview study of journal reviewers (n = 72) and found that they “were largely unsuccessful in identifying AI versus human writing, with an overall positive identification rate of only 38.9%.” Moreover, the use of AI itself in evaluating human-generated versus AI-generated scholarship has revealed mixed results thus far. Gao et al,12 for example, found that an AI detection tool based on the GPT-2 large language model greatly outperformed blinded human reviewers in discriminating between manuscript abstracts that were human-written and those generated by ChatGPT. In that same project, a plagiarism detector website rated human-written abstracts as more likely to have been plagiarized, as a greater percentage of matching text was found online. AI-generated abstracts were rated as being less similar to existing text. The mixed results emerging from this literature point to a need for further research on the possible merits and risks of using GenAI tools to enhance the writing process, as well as to develop robust scientific integrity safeguards. Academic Medicine proffered guidance to authors in 2023 regarding the use of AI tools in the preparation of manuscripts for our journal.13 We introduced a new policy for submissions that emphasized ethically salient aspects of accountability, disclosure, and transparency for authors engaging with AI tools.13 In keeping with the position of the Committee on Publication Ethics,14 we affirmed that AI tools must not be listed as authors. To serve as authors, individuals must have fulfilled 4 key requirements: contributing in a substantive manner to a manuscript, participating in the writing or revising of the work, providing approval for the final version of the work, and agreeing to be accountable publicly for all aspects of the work.15,16 Given the potential risks and rapid evolution of AI tools, we further highlighted the need for ongoing efforts by authors in ensuring the “accuracy, rigor, and integrity” of their scholarship.14 Working in parallel with our authors, the full editorial team of Academic Medicine committed to ongoing review and revision of the policies and practices of our journal “to align with the academic standards of our field.”13 Considerations that we have begun to work through are wide-ranging, related to the integrity and accessibility of data sets, strengthened review procedures, appropriate and selective use of AI detection tools in evaluating submissions, stricter requirements for authors and reviewers, safeguards with editorial management and publisher software, added monitoring processes, and potential consequences for AI-generated submissions by authors who failed to disclose their use of these tools. Members of our editorial team are also engaged with colleagues throughout the field of academic medicine and academic publishing to study, assess, and address emerging ethical questions related to scientific integrity and misconduct in the new era of GenAI.17 Many of these ideas resonate with the recommendations advanced in a recent editors’ statement18 on the responsible use of GenAI technologies in scholarly journal publishing. Those editors recommended that large language model GPT or other GenAI tools not be included as authors and that authors should be fully transparent about their use of AI tools. The editors provided further guidance related to editors’ and reviewers’ roles in evaluating scholarly submissions, including that editors should have “access to tools and strategies for ensuring authors’ transparency,” that editors and reviewers should not themselves “rely solely” on GenAI to review manuscript submissions, and that editors should “retain full responsibility” for selecting reviewers and overseeing the review process. The last recommendation was that the ultimate responsibility for editing a manuscript resides with “human authors and editors.” Creating additional safeguard practices and clearer policies should help to lessen the likelihood that journals inadvertently publish scholarly works that have been fabricated or are otherwise fraudulent due to the use of GenAI. As noted by Májovský et al,10 several measures are needed to reduce this risk across scholarly publishing: providing source data sets; establishing rigorous review procedures; creating ethical regulations for publishers and academic institutions; and having adverse consequences, such as temporary or permanent bans from publishing with certain journals for researchers found to have engaged in misconduct. Journals, publishers, professional societies, scholars, and other stakeholders will need to consider such ideas carefully—and quickly. An analysis by Lee et al19 examined the views and policies in July 2023 of the 50 leading journals in one specialty of medicine (i.e., radiology) and found that 45% did not include guidance regarding the use of AI as part of their submission guidelines for authors. Most (82%) of the journals that did share guidance deferred to the policies of a large publishing group, suggesting that specialty-specific issues will require additional thought. In this illustration drawn from the field of radiology, the question of verifying the authenticity of images that might be AI-generated will be particularly salient. The imperative for greater understanding of the applications and societal implications of AI is clear. An Executive Order20 from the White House issued in October 2023 declared the need for society “to foster capabilities for identifying and labeling synthetic content produced by AI systems, and to establish the authenticity and provenance of digital content.” The order also initiated an effort to study and document “issues that may hinder the effective use of AI in research and practices needed to ensure that AI is used responsibly for research.” In academic medicine, the need for greater AI literacy is also apparent. Recognizing the need for physicians-in-training to have greater competence in medical AI, Lee et al21 used an iterative Delphi method with broad engagement of experts, faculty, and students from medical schools across South Korea to derive 6 broad domains of medical AI competencies. Four of the domains were identified as essential for medically trained graduates to understand: digital health and changes driven by AI, fundamental knowledge and skills in medical AI, the ethics and legal aspects in the use of medical AI, and medical AI application in clinical practice. Two other domains, viewed as important but optional, were (1) processing, analyzing, and evaluating medical data and (2) undertaking research and development of medical AI. Taken together, these 6 domains encompass 36 specific competencies and subcompetencies. This report promises to be helpful as medical educators accelerate their efforts to develop curricula that are responsive to the increasing role and accelerating use of medical AI. How AI methods are used widely in health professions education, and the potential benefits and problems, are thoughtfully described by Patino et al22 in this issue of the journal. In terms of advantages, those authors suggest that AI tools have the potential to complete certain data-related tasks with less direct human effort and in less time. These possible advantages were mentioned in the Innovation Report by Laupichler et al23 also appearing in this issue. In that report, researchers compared the performances by medical student volunteers (n = 161) on multiple-choice examination questions developed by humans and by ChatGPT. They found that the questions were similarly difficult but that questions developed by human authors had a significantly higher discriminatory power and thus were better able to differentiate student test performances. The underlying reasons for this statistical result regarding greater discriminatory power of human-created questions remain unclear—are experienced educators better at developing more salient and more valid questions, or was the result due to a technical issue with GenAI tools that, with time, could be resolved? Interestingly, the student volunteers were able to identify with 57% accuracy whether the questions were created by human or ChatGPT sources. GenAI tools have the risk of perpetuating or amplifying bias, as noted by Patino et al and others.22,24 Bias may result from how the underlying algorithms were developed, trained, and deployed.25 Such biases may greatly affect the interpretation and application of findings and run the risk of generating misinformation or disinformation.22 For these reasons, the authors identify trustworthiness as a crucial issue, particularly when the task at hand is “high stakes,” e.g., shapes clinical care recommendations, as has been shown by Kasun et al.24 In health professions education, evaluating trainee performance is similarly “high stakes” and the need for trustworthiness is paramount. For these reasons, Patino et al conclude that AI methods should be engaged carefully and considered as part of a large repertoire of techniques (e.g., biostatistics) in health professions education: AI methods are not magical or infallible, and their use requires thoughtful reflection. They should be seen as tools, complementing the other resources and skills that faculty and researchers already possess.22 In making this argument, Patino et al continue to place the responsibility for trustworthiness on the shoulders of faculty members and researchers—human beings who serve in the field of academic medicine and fulfill the obligations of our profession. Ensuring that AI is used ethically to advance the salutary aims of academic medicine certainly entails trustworthy actions of faculty members and researchers. As the surprising and potent consequences of ever-widening use of AI have already taught us, however, it will take much, much more than individual efforts to safeguard against mischief, misuse, and misconduct related to applications of AI, especially in high-stakes activities in clinical care and clinical training. It will take dedicated commitment of all leaders and stakeholders of our field, working together and fully aware that potential consequences may lie far outside of our current imaginations.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.