Find Paritally Duplicated Rows in a Data Frame
Source:R/findPartialDuplicates.R
findPartialDuplicates.Rd
Find Rows in a data frame that are identical in the key columns but not identical in all columns
Arguments
- data
data frame
- key_columns
names of columns in
data
in which to look for duplicated (combined) values- skip_columns
names of columns to be skipped when looking for duplicated rows
Value
NULL
if there are no rows in data
that have identical
values in the key_columns
or if all groups of rows that have
identical values in the key_columns
are also identical in all the
other columns (except for those named in skip_columns
). Otherwise
a list is returned with the one element per duplicate in the key columns.
The list elements are subsets of data
representing the rows of
data
that are identical in the key columns but different in at least
one of the other columns.
Examples
findPartialDuplicates(key_columns = "id", data = rbind(
data.frame(id = 1, value = 1),
data.frame(id = 2, value = 2),
data.frame(id = 2, value = 3),
data.frame(id = 3, value = 3),
data.frame(id = 3, value = 3),
data.frame(id = 3, value = 3.1)
))
#> [[1]]
#> id value
#> 2 2 2
#> 3 2 3
#>
#> [[2]]
#> id value
#> 4 3 3.0
#> 5 3 3.0
#> 6 3 3.1
#>