Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Utilization of ChatGPT to Simplify Complex Dermatopathology Reports Into Patient-Friendly Language
2
Zitationen
7
Autoren
2024
Jahr
Abstract
To the Editor: As mandated by the 21st Century Cures Act Final Rule, dermatopathology reports are immediately released to patients and are often read before providers contact them to discuss results.1 Because these reports are written in specialized language, patients may not fully understand the results. One study showed that patients independently interpreting a dermatopathology report correctly identified the diagnosis only 12% of the time, and 92% felt worried after reading the report.2 It has been suggested that online chatbots such as Chat Generative Pre-Trained Transformer (ChatGPT) could aid patients in deciphering dermatopathology results.3 In a study assessing ChatGPT's translation of hypothetical dermatopathology reports for common dermatologic diagnoses into patient-friendly language, most physicians felt responses were factual and comprehensible.4 However, the scope and complexity of dermatopathology reports are wide. Before chatbots can be implemented in practice or recommended as patient resources, it is important to understand their performance when interpreting less simplistic cases. In this study, we evaluated ChatGPT's ability to summarize and simplify more nuanced, real dermatopathology reports. Real deidentified dermatopathology reports describing 5 neoplastic and 5 inflammatory dermatologic conditions were collected (Table 1). ChatGPT 4.0 was asked to summarize the reports at a sixth grade reading level and recommend urgency of dermatology follow-up (see Supplemental Digital Content 1, https://links.lww.com/AJDP/A153).5 Three board-certified dermatopathologists evaluated the responses for accuracy, completeness, readability, and potential to cause undue anxiety or harm on a 5-point Likert scale (see Supplemental Digital Content 1, https://links.lww.com/AJDP/A153).5 On this scale, a score of 5 corresponded to highest levels of accuracy, completeness, and readability, and lowest likelihood of causing anxiety or harm. Variance among rater responses was evaluated by SD. Readability of responses was analyzed using the Flesch-Kincaid Reading Grade Level. This study was approved by the Columbia University Institutional Review Board (AAAV1558). TABLE 1. - Average 5-Point Likert Scores for ChatGPT Interpretations of Dermatopathology Reports Diagnosis Accuracy Readability Completeness Unlikely to Cause Harm 1 Atypical lymphoid infiltrate 5.0 (±0) 4.7 (±0.6) 4.7 (±0.6) 4.7 (±0.6) 2 Atypical intraepidermal melanocytic proliferation 5.0 (±0) 5.0 (±0) 5.0 (±0) 4.3 (±0.6) 3 Atypical basaloid proliferation 5.0 (±0) 5.0 (±0) 5.0 (±0) 4.3 (±0.6) 4 Atypical spitzoid melanocytic proliferation 5.0 (±0) 3.3 (±2.1) 4.7 (±0.6) 3.3 (±1.2) 5 Melanocytoma 5.0 (±0) 4.7 (±0.6) 5.0 (±0) 5.0 (±0) 6 Interface dermatitis 4.7 (±0.6) 4.7 (±0.6) 5.0 (±0) 4.0 (±1.7) 7 Small vessel vasculitis 4.3 (±1.2) 4.3 (±1.2) 5.0 (±0) 4.0 (±1.7) 8 Non-necrotizing granulomatous dermatitis 5.0 (±0) 5.0 (±0) 5.0 (±0) 5.0 (±0) 9 Psoriasiform dermatitis 5.0 (±0) 5.0 (±0) 5.0 (±0) 5.0 (±0) 10 Lichenoid interface dermatitis 4.7 (±0.6) 5.0 (±0) 5.0 (±0) 4.7 (±0.6) Average across all reports 4.9 (±0.43) 4.7 (±0.8) 4.9 (±0.3) 4.4 (±0.9) 1 indicates strong disagreement and 5 indicates strong agreement. Parentheses indicate SD. Interpretations of dermatopathology reports were generated by ChatGPT 4.0 on July 23, 2024 (see Supplemental Digital Content 1, https://links.lww.com/AJDP/A153). Results are summarized in Table 1. The average ratings were 4.9 for accuracy, 4.7 for readability, 4.9 for completeness, and 4.4 for likelihood to cause harm. Across responses to all questions, the SD was 0.6, indicating reasonable interrater consistency. The average Flesch-Kincaid Reading Grade Level was 11.5 (range 9.6–14.1), and the average word count of ChatGPT responses was 225 (range 174–306). Overall, raters felt that ChatGPT interpretations of less common dermatopathology reports generated correct, complete, and easy to understand information. Accuracy and completeness were the highest rated aspects of responses, with low variance in scoring between dermatopathologists. Evaluators felt most responses were unlikely to cause undue anxiety or physical harm to patients. However, some raters were concerned that equivocating language present in some responses might cause unnecessary stress for some patients. For instance, the interpretation of the dermatopathology report for the melanocytoma noted that these lesions are “usually not cancerous,” while 1 rater felt this should be interpreted as completely benign (see Supplemental Digital Content 1, https://links.lww.com/AJDP/A153). Average readability of responses was 11.5, which is significantly higher than the sixth grade level requested by the prompt and suggested for medical educational materials.5 Raters felt that less verbose responses would likely increase readability and decrease confusion and anxiety for theoretical patient readers. Though generalizability of these results is limited by the small number of reports and evaluators used, this study suggests that with further fine-tuning, ChatGPT may represent a helpful resource for interim dermatopathology interpretation for patients who choose to read their reports before reviewing with their provider.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.