Find Paritally Duplicated Rows in a Data Frame
Source:R/findPartialDuplicates.R
findPartialDuplicates.RdFind Rows in a data frame that are identical in the key columns but not identical in all columns
Arguments
- data
data frame
- key_columns
names of columns in
datain which to look for duplicated (combined) values- skip_columns
names of columns to be skipped when looking for duplicated rows
Value
NULL if there are no rows in data that have identical
values in the key_columns or if all groups of rows that have
identical values in the key_columns are also identical in all the
other columns (except for those named in skip_columns). Otherwise
a list is returned with the one element per duplicate in the key columns.
The list elements are subsets of data representing the rows of
data that are identical in the key columns but different in at least
one of the other columns.
Examples
findPartialDuplicates(key_columns = "id", data = rbind(
data.frame(id = 1, value = 1),
data.frame(id = 2, value = 2),
data.frame(id = 2, value = 3),
data.frame(id = 3, value = 3),
data.frame(id = 3, value = 3),
data.frame(id = 3, value = 3.1)
))
#> [[1]]
#> id value
#> 2 2 2
#> 3 2 3
#>
#> [[2]]
#> id value
#> 4 3 3.0
#> 5 3 3.0
#> 6 3 3.1
#>