Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Considerations in the Reliability and Fairness Audits of Predictive Models for Advance Care Planning
1
Zitationen
27
Autoren
2022
Jahr
Abstract
Abstract Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians’ answers to the surprise question (“Would you be surprised if [patient X] passed away in [Y years]?”) as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as “Other.” 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8-10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders. Contribution to the Field Statement Artificial intelligence (AI) models developed from electronic health record (EHR) data can be biased and unreliable. Despite multiple guidelines to improve reporting of model fairness and reliability, adherence is difficult given the gap between what guidelines seek and operational feasibility of such reporting. We try to bridge this gap by describing a reliability and fairness audit of AI models that were considered for use to support team-based advance care planning (ACP) in three practice settings: Primary Care, Inpatient Oncology, and Hospital Medicine. We lay out the data gathering processes as well as the design of the reliability and fairness audit, and present results of the audit and decision maker survey. We discuss key lessons learned, how long the audit took to perform, requirements regarding stakeholder relationships and data access, and limitations of the data. Our work may support others in implementing routine reliability and fairness audits of models prior to deployment into a practice setting.
Ähnliche Arbeiten
Early Palliative Care for Patients with Metastatic Non–Small-Cell Lung Cancer
2010 · 7.291 Zit.
Principles of Good Practice for the Translation and Cultural Adaptation Process for Patient-Reported Outcomes (PRO) Measures: Report of the ISPOR Task Force for Translation and Cultural Adaptation
2005 · 5.218 Zit.
Rehospitalizations among Patients in the Medicare Fee-for-Service Program
2009 · 5.150 Zit.
On death and dying.
1975 · 4.940 Zit.
Shared decision-making in the medical encounter: What does it mean? (or it takes at least two to tango)
1997 · 4.078 Zit.
Autoren
- Jonathan Lu
- Amelia Sattler
- Samantha Wang
- Ali Raza Khaki
- Alison Callahan
- Scott L. Fleming
- Rebecca Fong
- Benjamin Ehlert
- Ron Li
- Lisa Shieh
- Kavitha Ramchandran
- Michael F. Gensheimer
- Sarah Chobot
- Stephen Pfohl
- Siyun Li
- Kenny Shum
- Nitin Parikh
- Priya Desai
- Briththa Seevaratnam
- Melanie Hanson
- Margaret Smith
- Yizhe Xu
- Arjun Gokhale
- Steven Lin
- Michael A. Pfeffer
- Winifred Teuteberg
- Nigam H. Shah