Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Detecting Mirror Patients in Clinical Trials: An AI-based Approach for Identifying Implausible Similarities Across Patients
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Abstract RATIONALE In clinical trials, identifying patterns of implausibly high similarity in data across study subjects is critical to ensure data integrity and participant uniqueness. Spirometry can provide a unique “signature” for each subject, as individual respiratory patterns, volumes, and flow rates can serve as a distinctive physiological profile. Leveraging an AI model to scan these data points enables automated detection of subjects with potentially duplicative or anomalously similar spirometry profiles, which could otherwise go unnoticed in traditional manual reviews. This approach not only facilitates timely identification of high-risk data but also supports a Risk-Based Monitoring (RBM) framework by proactively flagging data points, subjects, or study sites for additional expert human review.METHOD Our three-step approach compares spirometry sessions of patients at the randomization visit. The first step involves feature extraction from patient combinations. In the second step, a random forest classifier predicts a similarity score for each combination based on these extracted features. The model was trained on 14599 spirometry sessions from 1463 different patients. Finally, similar patients are grouped into clusters maximizing the similarity score and site-level statistics are computed for further analysis. RESULTS The random forest model was validated using 15 simulated sites, each containing a varied number of patients (ranging from 5 to 20) and cluster sizes (0% to 100% of patients involved). Clusters, defined as data points from different subject IDs but coming from the same human, were created using actual sessions from the same patients across different visits. The approach successfully identified 14 out of the 15 simulated sites, achieving a sensitivity of 100% and a specificity of 88%. Only one site was incorrectly flagged as implausible by the model. CONCLUSIONS The developed AI-based methodology effectively identifies potential duplicate patient enrollments in clinical trials, ensuring accurate recruitment counts and maintaining the integrity of trial results. With this approach, sites with subject data demonstrating unusually high similarity can be flagged for further assessment of data integrity by human experts.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.445 Zit.
UCI Machine Learning Repository
2007 · 24.290 Zit.
An introduction to ROC analysis
2005 · 20.596 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.102 Zit.
A method of comparing the areas under receiver operating characteristic curves derived from the same cases.
1983 · 7.061 Zit.