Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A simple heuristic for blindfolded record linkage
50
Zitationen
4
Autoren
2012
Jahr
Abstract
OBJECTIVES: To address the challenge of balancing privacy with the need to create cross-site research registry records on individual patients, while matching the data for a given patient as he or she moves between participating sites. To evaluate the strategy of generating anonymous identifiers based on real identifiers in such a way that the chances of a shared patient being accurately identified were maximized, and the chances of incorrectly joining two records belonging to different people were minimized. METHODS: Our hypothesis was that most variation in names occurs after the first two letters, and that date of birth is highly reliable, so a single match variable consisting of a hashed string built from the first two letters of the patient's first and last names plus their date of birth would have the desired characteristics. We compared and contrasted the match algorithm characteristics (rate of false positive v. rate of false negative) for our chosen variable against both Social Security Numbers and full names. RESULTS: In a data set of 19 000 records, a derived match variable consisting of a 2-character prefix from both first and last names combined with date of birth has a 97% sensitivity; by contrast, an anonymized identifier based on the patient's full names and date of birth has a sensitivity of only 87% and SSN has sensitivity 86%. CONCLUSION: The approach we describe is most useful in situations where privacy policies preclude the full exchange of the identifiers required by more sophisticated and sensitive linkage algorithms. For data sets of sufficiently high quality this effective approach, while producing a lower rate of matching than more complex algorithms, has the merit of being easy to explain to institutional review boards, adheres to the minimum necessary rule of the HIPAA privacy rule, and is faster and less cumbersome to implement than a full probabilistic linkage.
Ähnliche Arbeiten
The REDCap consortium: Building an international community of software platform partners
2019 · 23.421 Zit.
The FAIR Guiding Principles for scientific data management and stewardship
2016 · 17.334 Zit.
Bayesian Data Analysis
1995 · 13.754 Zit.
k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY
2002 · 8.450 Zit.
Business Intelligence and Analytics: From Big Data to Big Impact
2012 · 5.974 Zit.