Document Type
Article
Publication Date
11-18-2022
Abstract
An app-based clinical trial enrolment process can contribute to duplicated records, carrying data management implications. Our objective was to identify duplicated records in real time in the Apple Heart Study (AHS). We leveraged personal identifiable information (PII) to develop a dissimilarity score (DS) using the Damerau-Levenshtein distance. For computational efficiency, we focused on four types of records at the highest risk of duplication. We used the receiver operating curve (ROC) and resampling methods to derive and validate a decision rule to classify duplicated records. We identified 16,398 (4%) duplicated participants, resulting in 419,297 unique participants out of a total of 438,435 possible. Our decision rule yielded a high positive predictive value (96%) with negligible impact on the trial's original findings. Our findings provide principled solutions for future digital trials. When establishing deduplication procedures for digital trials, we recommend collecting device identifiers in addition to participant identifiers; collecting and ensuring secure access to PII; conducting a pilot study to identify reasons for duplicated records; establishing an initial deduplication algorithm that can be refined; creating a data quality plan that informs refinement; and embedding the initial deduplication algorithm in the enrolment platform to ensure unique enrolment and linkage to previous records.
Recommended Citation
Garcia, Ariadna; Lee, Justin; Balasubramanian, Vidhya; Gardner, Rebecca; Gummidipundi, Santosh E.; Hung, Grace; Ferris, Todd; Cheung, Lauren; Desai, Sumbul; Granger, Christopher B.; Hills, Mellanie True; Kowey, Peter; Nag, Divya; Rumsfeld, John S.; Russo, Andrea M.; Stein, Jeffrey W.; Talati, Nisha; Tsay, David; Mahaffey, Kenneth W.; Perez, Marco V.; Turakhia, Mintu P.; Hedlin, Haley; and Desai, Manisha, "The Development of a Mobile App-Focused Deduplication Strategy for the Apple Heart Study That Informs Recommendations for Future Digital Trials" (2022). Department of Medicine Faculty Papers. Paper 403.
https://jdc.jefferson.edu/medfp/403
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
PubMed ID
36589778
Language
English
Comments
This article is the author's final published version in Stat, Volume 11, Issue 1, 2022, Article number e470.
The published version is available at https://doi.org/10.1002/sta4.470. Copyright © 2022 The Authors. Stat published by John Wiley & Sons Ltd.