Find Paritally Duplicated Rows in a Data Frame — findPartialDuplicates • kwb.utils

Find Rows in a data frame that are identical in the key columns but not identical in all columns

Usage

findPartialDuplicates(data, key_columns, skip_columns = NULL)

Arguments

data: data frame
key_columns: names of columns in data in which to look for duplicated (combined) values
skip_columns: names of columns to be skipped when looking for duplicated rows

Value

NULL if there are no rows in data that have identical values in the key_columns or if all groups of rows that have identical values in the key_columns are also identical in all the other columns (except for those named in skip_columns). Otherwise a list is returned with the one element per duplicate in the key columns. The list elements are subsets of data representing the rows of

data that are identical in the key columns but different in at least one of the other columns.

Examples

findPartialDuplicates(key_columns = "id", data = rbind(
  data.frame(id = 1, value = 1),
  data.frame(id = 2, value = 2),
  data.frame(id = 2, value = 3),
  data.frame(id = 3, value = 3),
  data.frame(id = 3, value = 3),
  data.frame(id = 3, value = 3.1)
))
#> [[1]]
#>   id value
#> 2  2     2
#> 3  2     3
#> 
#> [[2]]
#>   id value
#> 4  3   3.0
#> 5  3   3.0
#> 6  3   3.1
#>