Skip to contents

This function deduplicates citation data. Note that duplicates are assumed to published in the same journal, so pre-prints and similar results will not be identified here.

Usage

dedup_citations(
  raw_citations,
  manual = FALSE,
  shiny_progress = FALSE,
  show_unknown_tags = FALSE
)

Arguments

raw_citations

Citation dataframe with relevant columns

manual

logical. If TRUE, manually specify pairs of duplicates to merge. Default is FALSE.

shiny_progress

logical. If TRUE, show a progress bar in the Shiny app. Default is FALSE.

Value

unique citations formatted for CiteSource

Examples

# Load example data from the package
examplecitations_path <- system.file("extdata", "examplecitations.rds", package = "CiteSource")
examplecitations <- readRDS(examplecitations_path)

# Deduplicate citations
dedup_results <- dedup_citations(examplecitations)
#> formatting data...
#> Warning: Search contains missing values for the record_id column. A record_id will be created using row numbers
#> identifying potential duplicates...
#> identified duplicates!
#> Joining with `by = join_by(record_id)`
#> flagging potential pairs for manual dedup...
#> Warning: There were 2 warnings in `mutate()`.
#> The first warning was:
#>  In argument: `min_id = min(duplicate_id.x, duplicate_id.y)`.
#> Caused by warning in `min()`:
#> ! no non-missing arguments, returning NA
#>  Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> Joining with `by = join_by(duplicate_id.x, duplicate_id.y)`
#> 165 citations loaded...
#> 67 duplicate citations removed...
#> 98 unique citations remaining!