Apply Groups of Filter Criteria from Configuration
Usage
apply_filters(data, groups, length_column = NULL, id_columns = names(data)[1L])
Arguments
- data
data frame
- groups
names of filter criteria groups defined in list returned by
kwb.prep:::read_filter_criteria
- length_column
name of column in
data
containing lengths (to be summed up for the overview that is returned)- id_columns
names of column(s) in
data
that uniquely identify the records. This column / these columns are returned in order to report about the records that have been removed
Value
data, filtered according to the specified criteria. The returned data
frame has an attribute filter_info
being a list with as many
elements as there are groups
. The elements are named according to
the values given in groups
. Each list element is a list with one
element overview
(being a data frame with one row per filter
criterion) and further elements removed_<i>
being data frames with
only id_columns
that represent the records that have been removed in
the according filter step i
.
Examples
# Define filter criteria
criteria <- list(
sepal = c(
"sepal short" = "Sepal.Length < 5",
"sepal narrow" = "Sepal.Width < 3"
),
petal = c(
"petal short" = "Petal.Length < 5",
"petal narrow" = "Petal.Width < 3"
)
)
# Write criteria to temporary yaml file
tdir <- tempdir()
yaml::write_yaml(criteria, file.path(tdir, "filter_criteria.yml"))
# Set path to temporary "config" folder so that kwb.prep knows about it
set_user_config_dir(tdir)
# Apply filter groups "sepal" and "petal" to the iris dataset
result <- apply_filters(iris, c("sepal", "petal"))
#> ## Filterschritte 'sepal' anwenden auf 'sepal'
#>
#> ## Filterschritte 'sepal' anwenden auf 'sepal'
#>
#> ```{r eval = FALSE}
#> Evaluating Sepal.Length < 5 ...
#> is TRUE for 22 rows ( 14.7 %),
#> FALSE for 128 rows ( 85.3 %) and
#> NA for 0 rows ( 0.0 %).
#> Selected rows now: 22
#> Evaluating Sepal.Width < 3 ...
#> is TRUE for 57 rows ( 38.0 %),
#> FALSE for 93 rows ( 62.0 %) and
#> NA for 0 rows ( 0.0 %).
#> Selected rows now: 4
#> ```
#>
#> Eine Uebersicht ueber die entfernten Haltungen und deren Laengen kann nicht erzeugt werden, da
#>
#> - keine Laengenangaben uebergeben wurden
#>
#> ## Filterschritte 'petal' anwenden auf 'petal'
#>
#> ## Filterschritte 'petal' anwenden auf 'petal'
#>
#> ```{r eval = FALSE}
#> Evaluating Petal.Length < 5 ...
#> is TRUE for 4 rows (100.0 %),
#> FALSE for 0 rows ( 0.0 %) and
#> NA for 0 rows ( 0.0 %).
#> Selected rows now: 4
#> Evaluating Petal.Width < 3 ...
#> is TRUE for 4 rows (100.0 %),
#> FALSE for 0 rows ( 0.0 %) and
#> NA for 0 rows ( 0.0 %).
#> Selected rows now: 4
#> ```
#>
#> Eine Uebersicht ueber die entfernten Haltungen und deren Laengen kann nicht erzeugt werden, da
#>
#> - keine Laengenangaben uebergeben wurden
#>
# Have a look at the result
str(result)
#> 'data.frame': 4 obs. of 5 variables:
#> $ Sepal.Length: num 4.4 4.5 4.9 4.9
#> $ Sepal.Width : num 2.9 2.3 2.4 2.5
#> $ Petal.Length: num 1.4 1.3 3.3 4.5
#> $ Petal.Width : num 0.2 0.3 1 1.7
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 2 3
#> - attr(*, "filter_info")=List of 2
#> ..$ sepal:List of 3
#> .. ..$ overview :'data.frame': 2 obs. of 5 variables:
#> .. .. ..$ CleaningStep : chr [1:2] "Keep 'Sepal.Length < 5'" "Keep 'Sepal.Width < 3'"
#> .. .. ..$ Count.go : int [1:2] 128 93
#> .. .. ..$ Percentage.go : num [1:2] 85.3 62
#> .. .. ..$ Count.keep : int [1:2] 22 4
#> .. .. ..$ Percentage.keep: num [1:2] 14.67 2.67
#> .. ..$ removed_1:'data.frame': 128 obs. of 1 variable:
#> .. .. ..$ Sepal.Length: num [1:128] 5.1 5 5.4 5 5.4 5.8 5.7 5.4 5.1 5.7 ...
#> .. ..$ removed_2:'data.frame': 93 obs. of 1 variable:
#> .. .. ..$ Sepal.Length: num [1:93] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.9 5.4 ...
#> ..$ petal:List of 3
#> .. ..$ overview :'data.frame': 2 obs. of 5 variables:
#> .. .. ..$ CleaningStep : chr [1:2] "Keep 'Petal.Length < 5'" "Keep 'Petal.Width < 3'"
#> .. .. ..$ Count.go : int [1:2] 0 0
#> .. .. ..$ Percentage.go : int [1:2] 0 0
#> .. .. ..$ Count.keep : int [1:2] 4 4
#> .. .. ..$ Percentage.keep: num [1:2] 100 100
#> .. ..$ removed_1:'data.frame': 0 obs. of 1 variable:
#> .. .. ..$ Sepal.Length: num(0)
#> .. ..$ removed_2:'data.frame': 0 obs. of 1 variable:
#> .. .. ..$ Sepal.Length: num(0)