

Ovid and Covidence possessed the highest specificity for identifying duplicate references, while Rayyan demonstrated the highest sensitivity. We found that the most accurate methods for identifying duplicate references were Ovid, Covidence, and Rayyan. Using the benchmark set as reference, the number of false-negative and false-positive duplicate references for each method was identified, and accuracy, sensitivity, and specificity were determined. The default settings were then used in Ovid multifile search, EndNote desktop, Mendeley, Zotero, Covidence, and Rayyan to de-duplicate the sample of references independently.
#Text deduplicator manual
References were de-duplicated via manual abstraction to create a benchmark set. We examined the accuracy and efficiency of commonly used electronic methods for flagging and removing duplicate references during this process.Ī heterogeneous sample of references was obtained by conducting a similar topical search in MEDLINE, Embase, Cochrane Central Register of Controlled Trials, and PsycINFO databases. As this type of evidence synthesis is increasingly pursued, the use of various electronic platforms can help researchers improve the efficiency and quality of their research. Systematic reviews involve searching multiple bibliographic databases to identify eligible studies. RefDeduR provides an effective solution to perform reference deduplication and represents a valuable advance in expanding the open-source toolkit to support evidence synthesis research. Therefore, the tool is customizable, accurate, high-throughput, and practical. We also introduce a decision-tree algorithm, consider preprints when they co-exist with a peer-reviewed version, and provide actionable recommendations. We modularize the pipeline into text normalization, three-step exact matching, and two-step fuzzy matching processes. Here, we present RefDeduR, a text-normalization and decision-tree aided R package that enables accurate and high-throughput reference deduplication. Existing tools fail to fulfill these emerging needs, as they are often labor-intensive, insufficient in accuracy, and limited to clinical fields.

As the scientific literature grows exponentially and research becomes increasingly interdisciplinary, accurate and high-throughput reference deduplication is vital in evidence synthesis studies (e.g., systematic reviews, meta-analyses) to ensure the completeness of datasets while reducing the manual screening burden.
