Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating an artificial intelligence (AI) communication platform for racial biases in information quality about prostate cancer (PCa) germline testing.
0
Zitationen
9
Autoren
2025
Jahr
Abstract
283 Background: Guidelines recommend germline testing in advanced prostate cancer (PCa) to inform treatment and familial cancer risk, but Black patients are less likely to complete testing than white patients. Artificial intelligence (AI) tools are increasingly used in pre-test counseling to support education, access, and equity, but remain unevaluated for racial bias, which is a known hazard in other clinical contexts. Methods: We adapted a secure UCSF-developed generative AI platform (“ProGene”) for PCa genetics education. ProGene was prompted with 7 frequently asked questions (FAQs): types of genetic testing and test results, personal benefits, familial benefits, drawbacks, logistics, costs, and privacy concerns. We asked each FAQ 9 times: 3 simulated patients with metastatic PCa (Black, non-Hispanic white, and race-agnostic), each in triplicate, for a total 63 questions. Two blinded reviewers assessed the 63 responses across 4 domains: 1) Comprehensiveness (mean, 0–100%) using an investigator-created rubric, 2) Accuracy (proportion, presence/absence of inaccuracies), 3) Readability (mean grade level) via SMOG and Flesch-Kincaid formulas, and 4) Actionability (mean, 0-100%) based on PEMAT. We used the two-sample t-test and Wilcoxon rank-sum test to compare continuous outcomes, and chi-squared test for categorical outcomes, between race subgroups and each FAQ. Results: Table 1 summarizes outcomes by race. For comprehensiveness, the mean score was 67% and did not vary by race. However, for FAQ 1 (types of genetic testing and test results), ProGene responses to the Black patient were less comprehensive than those to the race-agnostic patient (60% vs 93%; p < 0.01). Inaccuracies were present in 32% of responses (median 0 and range 0-5 inaccuracies per response), most often due to misstatements about sample collection and misleading cost/insurance information, and did not vary by race or FAQ. Mean readability was 10th (SMOG) and 13th grades (Flesch-Kincaid), and did not vary by race/FAQ. Mean actionability was 92% and did not vary by race/FAQ. Conclusions: We did not identify major racial biases in the quality of the AI communication platform’s responses to PCa germline testing questions. Overall actionability was high, comprehensiveness and accuracy were moderate, but readability was limited, presenting opportunities for improved prompt engineering, AI model updates, and human oversight. AI communication platforms are a promising tool to promote equity in PCa germline testing delivery, warranting continued refinement and evaluation. Evaluation of AI platform response quality by race. Quality domain Black Non-Hispanic white Race agnostic Comprehensiveness (%) 64% 63% 74% Inaccuracy rate (%) 24% 38% 33% Readability (grade level) 11 12 12 Actionability (%) 90% 92% 93%
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.