Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Using data mining techniques to characterize participation in observational studies
44
Zitationen
2
Autoren
2016
Jahr
Abstract
Data mining techniques are gaining in popularity among health researchers for an array of purposes, such as improving diagnostic accuracy, identifying high-risk patients and extracting concepts from unstructured data. In this paper, we describe how these techniques can be applied to another area in the health research domain: identifying characteristics of individuals who do and do not choose to participate in observational studies. In contrast to randomized studies where individuals have no control over their treatment assignment, participants in observational studies self-select into the treatment arm and therefore have the potential to differ in their characteristics from those who elect not to participate. These differences may explain part, or all, of the difference in the observed outcome, making it crucial to assess whether there is differential participation based on observed characteristics. As compared to traditional approaches to this assessment, data mining offers a more precise understanding of these differences. To describe and illustrate the application of data mining in this domain, we use data from a primary care-based medical home pilot programme and compare the performance of commonly used classification approaches - logistic regression, support vector machines, random forests and classification tree analysis (CTA) - in correctly classifying participants and non-participants. We find that CTA is substantially more accurate than the other models. Moreover, unlike the other models, CTA offers transparency in its computational approach, ease of interpretation via the decision rules produced and provides statistical results familiar to health researchers. Beyond their application to research, data mining techniques could help administrators to identify new candidates for participation who may most benefit from the intervention.
Ähnliche Arbeiten
Applied logistic regression
1990 · 35.656 Zit.
The central role of the propensity score in observational studies for causal effects
1983 · 30.745 Zit.
SPSS and SAS procedures for estimating indirect effects in simple mediation models
2004 · 17.126 Zit.
A Proportional Hazards Model for the Subdistribution of a Competing Risk
1999 · 13.503 Zit.
Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models
1982 · 12.623 Zit.