This function deduplicates citation data. Note that duplicates are assumed to published in the same journal, so pre-prints and similar results will not be identified here.
Usage
dedup_citations(
raw_citations,
manual = FALSE,
shiny_progress = FALSE,
show_unknown_tags = FALSE
)
Arguments
- raw_citations
Citation dataframe with relevant columns
- manual
logical. If TRUE, manually specify pairs of duplicates to merge. Default is FALSE.
- shiny_progress
logical. If TRUE, show a progress bar in the Shiny app. Default is FALSE.
Examples
# Load example data from the package
examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource")
examplecitations <- readRDS(examplecitations_path)
# Deduplicate citations
dedup_results <- dedup_citations(examplecitations)
#> formatting data...
#> Warning: Search contains missing values for the record_id column. A record_id will be created using row numbers
#> identifying potential duplicates...
#> identified duplicates!
#> Joining with `by = join_by(record_id)`
#> flagging potential pairs for manual dedup...
#> Warning: There were 2 warnings in `mutate()`.
#> The first warning was:
#> ℹ In argument: `min_id = min(duplicate_id.x, duplicate_id.y)`.
#> Caused by warning in `min()`:
#> ! no non-missing arguments, returning NA
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> Joining with `by = join_by(duplicate_id.x, duplicate_id.y)`
#> 165 citations loaded...
#> 67 duplicate citations removed...
#> 98 unique citations remaining!