Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Counterfactual prediction from machine learning models: transportability and joint analysis for model development and evaluation using multi-source data
0
Zitationen
6
Autoren
2025
Jahr
Abstract
BACKGROUND: When a machine learning model is developed and evaluated in a setting where the treatment assignment process differs from the setting of intended model deployment, failure to account for this difference can lead to suboptimal model development and biased estimates of model performance. METHODS: We consider the setting where data from a randomized trial and an observational study emulating the trial are available for machine learning model development and evaluation. We provide two approaches for estimating the model and assessing model performance under a hypothetical treatment strategy in the target population underlying the observational study. The first approach uses counterfactual predictions from the observational study only and relies on the assumption of conditional exchangeability between treated and untreated individuals (no unmeasured confounding). The second approach leverages the exchangeability between treatment groups in the trial (supported by study design) to "transport" estimates from the trial to the population underlying the observational study, relying on an additional assumption of conditional exchangeability between the populations underlying the observational study and the randomized trial. RESULTS: We examine the assumptions underlying both approaches for fitting the model and estimating performance in the target population and provide estimators for both objectives. We then develop a joint estimation strategy that combines data from the trial and the observational study, and discuss benchmarking of the trial and observational results. CONCLUSIONS: Both the observational and transportability analyses can be used to fit a model and estimate performance under a counterfactual treatment strategy in the population underlying the observational data, but they rely on different assumptions. In either case, the assumptions are untestable, and deciding which method is more appropriate requires careful contextual consideration. If all assumptions hold, then combining the data from the observational study and the randomized trial can be used for more efficient estimation.
Ähnliche Arbeiten
Applied logistic regression
1990 · 35.656 Zit.
The central role of the propensity score in observational studies for causal effects
1983 · 30.667 Zit.
SPSS and SAS procedures for estimating indirect effects in simple mediation models
2004 · 17.091 Zit.
A Proportional Hazards Model for the Subdistribution of a Competing Risk
1999 · 13.479 Zit.
Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models
1982 · 12.606 Zit.